Deep Atlantic Storage: Reading File Upload in Web Workers

I'm bored on 4th of July holiday, so I made a wacky webpage: Deep Atlantic Storage. It is described as a free file storage service, where you can upload any file to be stored deep in the Atlantic Ocean, with no size limit and content restriction whatsoever. How does it work, and how can I afford to provide it?

This article is the second of a 3-part series that reveals the secrets behind Deep Atlantic Storage. The previous part introduced the algorithm I use to sort all the bits in a Uint8Array. Now I'd continue from there, and explain how the webpage accepts and processes file uploads.

File Upload

File upload has always been a part of HTML standard as long as I remembered:

<form action="upload.php" method="POST" enctype="multipart/form-data">
  <input type="file" name="file">
  <input type="submit" value="upload">
</form>

The Worst Server-Side Rendering Pipeline

My server side rendering pipeline: I use nginx to invoke PHP to invoke Node.js to invoke Puppeteer to invoke Chromium. Client side receives a screenshot of the webpage. They can never steal my super secret HTML and JavaScript code again.

How to have hyperlinks?

If the whole webpage is a screenshot picture, how to have hyperlinks you ask?

<A HREF=browse.php>
  <IMG SRC=page.bmp WIDTH=640 HEIGHT=480 ISMAP>
</A>

The ISMAP attribute creates a server-side image map. You click anywhere and the server receives coordinates, like this:

Getting Started with NDNts Web Application using webpack

This article shows how to get started with NDNts, Named Data Networking (NDN) libraries for the modern web. In particular, it demonstrates how to write a consumer-only web application that connects to the NDN testbed, transmits a few Interests, and gets responses. This application uses JavaScript programming language and webpack module bundler.

Prepare the System

To use NDNts, you must have Node.js. As of this writing, NDNts works best with Node.js 14.x, and you should install that version. The easiest way to install Node.js is through Node Version Manager (nvm) or Node Version Manager (nvm) for Windows.

On Ubuntu 18.04, you can install nvm and Node.js with the following commands:

$ wget -qO- https://raw.githubusercontent.com/nvm-sh/nvm/v0.36.0/install.sh | bash
$ nvm install 14
Now using node v14.14.0 (npm v6.14.8)

yoursunny.com is in Git and Completely Rebuilt

I started making websites since 2001. This website, yoursunny.com, started in 2006. In the past 11 years, I've rebuilt the site several times, switched from ASP to PHP, and moved from Windows dedicated hosting to shared hosting and eventually to Linux VPS. So far, every time I want to perform a major edit to the website, I copy original versions of affected files to a backup folder on my computer, and then go ahead to do the edit. After having tested the modification locally, I upload changed files via FTP or SFTP to the server. One constant worry over my head is, what if I lose all the files on my computer, and my hosting provider vanishes so I can't get them back? Another headache is, sometimes I may make an edit incorrectly, but I couldn't revert it back because I didn't copy the original files to the backup folder as I determined the change wasn't "major" enough to warrant a backup.

During my studies at University of Arizona, I learned a useful tool called git. Git is a source control system: it allows developers to create a repository and put source code into the repository, and will automatically keep track of all the edits applied to each file. By putting website source code into a git repository, I can find out what modifications I've performed to each file over time, regardless of whether it is "major" or "minor". Additionally, I can synchronize the git repository with a remote git server, so that the server has a copy of my website, including edit histories as well. This would solve both the worry of losing files, and the headache of not having an earlier version to revert to in case of an incorrect modification.

After delaying this projects several times, I am finally determined to move yoursunny.com into git repositories in Apr 2017. At the same time, to keep the website source code as clean as possible, I decide to try out two new technologies: static site generators and Composer. That is, I would rebuild yoursunny.com, copying page by page, into a new website stored in git repositories.

I spent about 2 months for this rebuild/move, and I'm happy to announce that yoursunny.com is now under source control.

Main Site with Composer

那些年,我做过的网站

2001年,我制作了第一个个人主页。10年过去了,让我来回顾一下,10年来我做过的网站。

以下列举了34个网站,只包含目前仍然在线、或者能够找到截图/源码的站点,且不包含某些非公开项目;实际做过的网站,大约在60个左右。部分截图来自Internet Archive Wayback Machine

2001 个人主页

  • 内容:自我介绍,我的第一张数码照片
  • 技术:FrontPage
  • 发布:通过电子邮件发给读者
个人主页 截图

2002 FredSoft

  • 内容:VB编程作品
  • 技术:手写HTML4
  • 发布:学校空间
FredSoft 截图

2002 个人文件夹

  • 内容:链接到子文件夹
  • 技术:手写HTML4、CSS、JavaScript
  • 发布:本地计算机个人使用,Windows 98的folder.htt功能
个人文件夹 截图

10分钟为网站添加聊天室功能

2016-01-05更新:Windows Live Messenger Connect已于2013年关闭,本技巧已经失效。

如果你正在运营一家新闻/小说/视频类网站,每时每刻都有大量用户访问同一个页面,那么你可以在网页上添加一个“聊天室”,让同时访问同一网页的用户互相交流。

根据本文介绍的方法,你可以在10分钟为网站添加聊天室功能,而且不占用任何服务器资源。

第一步:应用注册(3分钟)

本文介绍的“聊天室”功能由Windows Live Messenger Connect提供,需要使用一个Windows Live ID来管理。如果你的网站不是个人网站,建议你新注册一个Windows Live ID作为开发者帐号,以便在必要时与其他网站管理者共同使用该帐号。注意密码必须足够复杂,否则是无法作为开发者帐号使用的。

双层动态图片保护Email地址

Email是最重要的通讯工具之一。滥发垃圾信息者会利用一切可能的通讯工具,把小广告送到你的手中。发送Email的成本极为低廉,使得Email倍受垃圾信息发送者的青睐。发送垃圾邮件的第一步,是收集足够多的有效电子邮件地址;而收集Email地址的方法,主要就是查询目录、使用网页爬虫两种。

通过查询目录收集Email地址,主要是查询各类电子黄页、网站备案资料、域名WHOIS信息等。例如WHOIS yoursunny.com,可以查询到域名注册者的Email地址(有些WHOIS结果页面会用各种方式保护Email地址,但大部分并不会进行保护)。根据规定,域名WHOIS信息必须包含有效的Email地址,因此规避这种收集的唯一方法是使用Private Domain Registration服务,令WHOIS信息中包含的Email地址不断变化。

使用网页爬虫收集Email地址,是指垃圾邮件发送者编写程序抓取互联网上的网页,在网页文本中提取形似Email地址的字符串。比如某网页包含了 someone@example.com 这个Email地址,爬虫程序抓取该网页时就可以用正则表达式找到这个地址。本文主要讨论针对这种收集方法的防范。

防范网页爬虫抓取Email地址的传统方法

使用变体Email地址

SNS社交平台的核心技术架构

SNS(Social Networking Sites),是一类用户可以自己发布信息、与别人互动并分享个人或职业兴趣的网站。知名的SNS网站有Facebook、MySpace、人人网(校内网)、开心网等。

运营一个SNS并不容易,需要用户体验研究、艺术设计、前端开发、后端架构、系统维护优化等技术人员,以及商务、公关、客服等非技术部门的通力配合。阳光男孩是技术人员,所以只谈技术。本文只想涉及SNS技术的一小部分:平台核心架构。所谓“平台核心架构”,我说的是SNS网站最基础、最核心的部分。与“操作系统”对比,SNS的平台核心架构,就相当于操作系统的内核。阳光男孩认为,SNS平台的核心架构包括两块:应用接入、消息分发。

应用接入,让用户有事可做

完善、稳定的应用接入平台,让用户在SNS平台上有事可做。这里所说的“应用”,既包括第三方应用(比如开心农场、荣光医院、跑火车等各类游戏),也包括SNS网站自己推出的应用(比如日志、照片、相册等各种内置工具)。自有应用+第三方应用,约等于SNS网站上除了首页、个人主页、设置页面以外的所有页面。

SNS平台为接入应用提供的功能

Web应用的评测清单

有人说,只需24小时你就可以完成一个网站,并且开始做生意。建立Web站点已经变得像搭积木一样简单:在WordPress、Discuz、UCenter Home、ShopEx等常用建站程序选择一个(或者多个),能让你在3分钟之内创建一个可以运行的Web站点;然后,你可以用剩下的23小时57分钟安装插件、修改模板、发布内容,实现更丰富的功能。

不过,这些“积木”并不一定能满足你的目的。很多情况下,你需要自己(或者请人)开发一块新的“积木”——编写网站程序。至少,你也得对某些“积木”进行一定的修改和扩充,来满足你网站的特殊需求。

当你的网站并不完全由常用建站程序组成时,Web站点的质量就是一个必须关注的问题了。

Web站点的质量包括哪些方面?

我认为,制作一个网站,必须关注的质量问题,至少包括下面这些:

我的Web开发学习之路

最近有人问我,如何开始学习Web开发?在此,我想对自己学习Web开发的历程作一个回顾。

网页制作 vs Web开发

实际上,很多人提出的问题并不是“如何开始学习Web开发”,而是:

  • 怎么做网站?
  • 网页制作的方法
  • 建立网页的流程
  • 我会PHP和Dreamweaver,还要学什么?
  • 寻网页制作程序编程高手

现在,我坚持使用Web开发这个词,而不是网页制作网站建设这些说法。我认为,“网站”只是Web开发的一部分;学习了这么多开发技能,仅仅用于制作“网站”或“网页”,就是大大的浪费。