yoursunny.com is in Git and Completely Rebuilt

I started making websites since 2001. This website, yoursunny.com, started in 2006. In the past 11 years, I've rebuilt the site several times, switched from ASP to PHP, and moved from Windows dedicated hosting to shared hosting and eventually to Linux VPS. So far, every time I want to perform a major edit to the website, I copy original versions of affected files to a backup folder on my computer, and then go ahead to do the edit. After having tested the modification locally, I upload changed files via FTP or SFTP to the server. One constant worry over my head is, what if I lose all the files on my computer, and my hosting provider vanishes so I can't get them back? Another headache is, sometimes I may make an edit incorrectly, but I couldn't revert it back because I didn't copy the original files to the backup folder as I determined the change wasn't "major" enough to warrant a backup.

During my studies at University of Arizona, I learned a useful tool called git. Git is a source control system: it allows developers to create a repository and put source code into the repository, and will automatically keep track of all the edits applied to each file. By putting website source code into a git repository, I can find out what modifications I've performed to each file over time, regardless of whether it is "major" or "minor". Additionally, I can synchronize the git repository with a remote git server, so that the server has a copy of my website, including edit histories as well. This would solve both the worry of losing files, and the headache of not having an earlier version to revert to in case of an incorrect modification.

After delaying this projects several times, I am finally determined to move yoursunny.com into git repositories in Apr 2017. At the same time, to keep the website source code as clean as possible, I decide to try out two new technologies: static site generators and Composer. That is, I would rebuild yoursunny.com, copying page by page, into a new website stored in git repositories.

I spent about 2 months for this rebuild/move, and I'm happy to announce that yoursunny.com is now under source control.

Main Site with Composer

The main site is still written in PHP. I proudly choose PHP because it can run straight from source code without a compilation step.

Instead of manually downloading third-party libraries into website folders, I adopted Composer, which can automatically download dependencies according to a configuration file. Composer really simplifies my workflow, because it can recursively download dependencies of dependencies of dependencies, and generate a helpful "autoload" file which I can require_once instead of require-ing each library separately. My concern on Composer is that, if one day, Composer's catalog server disappears from the Internet, I would not be able to retrieval those packages any more. This is especially true because I wanted my website to last as long as my life. I don't have a good solution for this yet.

Other than that, Twig templates I wrote in 2014 are still useful, although they needed minor adaptations to upgrade from Twig 1.x to 2.x. I redesigned the front page to include a cute photograph of myself, and moved the social feed which probably only my family is reading onto a subpage.

/study on Jekyll

The /study sub site, a collection of courseware when I was an Information Security major at Shanghai Jiao Tong University, is now generated by Jekyll. Jekyll is the top static site generator which allows me to write the website in Markdown, and compiles it into static HTML files. I choose to make /study a static site because I'm no longer attending Shanghai Jiao Tong University and I'm not authorized to re-publish course materials from my current school, so the site will remain static forever.

The /study sub site consists of entirely "pages", and does not use Jekyll's blogging capability. I managed to make a template that matches the style of my website, instead of adopting their default template.

Query Strings in Jekyll Site

There was one difficulty during the conversion.

In 2008 I decided it's cleaner to pack multiple small web pages into a single HTML file with separator lines (which looks like <!--#topic HDD-->), and then use a PHP script to display a chosen section (which can be requested like /study/EI209/?topic=HDD). I made this decision back then so I didn't have to open each page in a new Notepad2 window when writing the content.

Now, the file can be split into multiple pages with a bash script just fine. But the problem is, a query string such as ?topic=HDD is not supported in a static site: the page would be generated at /study/EI209/HDD.htm, the URI will load index.html file in /study/EI209/ folder. I could put some JavaScript into that index.html to redirect in the browser, but this is bad practice because HTTP standard recommends 301 status code when a page is permanently moved.

The solution is to make NGINX invoke a PHP script when a query string is present:

if ($args) {
  rewrite ^ /_internal/study-redirect.php last;
}

The PHP script can then recognize the query string, and decide whether to redirect, or to declare the page is not found.

Blog on Hexo

I choose a different static site generator, Hexo, for my blog aka /t sub site. Hexo is a fast, simple, and powerful blog framework built in Node.js. I decide to use Hexo instead of Jekyll for the blog due to the following reasons:

  • I should learn more than one static file generator, and not put all the eggs in one basket.
  • Hexo is written in JavaScript, a language I am somewhat familiar with. I learned JavaScript from a book in 2002, although I'm not at all familiar with recent advancements in this language. Nevertheless, in case I need any customization, I'm more confident to work with JavaScript, than with Ruby which I have barely seen in Vagrantfiles.
  • A technical reason is, Hexo has a post asset folder feature which allows me to put pictures related to each article in a folder named after this article. Jekyll, instead, requires all pictures to be uploaded to a different place unrelated to the article. My existing website already stores pictures along with each article, so Hexo's arrangement looks more natural to me.

The actual conversion is less difficult than /study sub site. I started writing new articles in Markdown format in 2014, which is recognized by Hexo (and all other static site generators). Older articles are in HTML, and Hexo can load them as well. The front matter, such as title and dates, was placed in either an XML file or a fixed "header" at top few lines of my Markdown/HTML file, so I write a bash script to convert them into Hexo's YAML format.

The template format is EJS, different from Twig or Jekyll's Liquid, so I have to make a third version of my website template.

Excerpt

A benefit of adopting Hexo is that I can show contents of each article on the homepage of my blog, instead of a list of links with just the article titles. However, Hexo is going too far: it shows the entire articles, instead of just the top half of them. hexo-excerpt is a good solution to this problem: it shows only the first few paragraphs of each article on homepage, leaving the full article on the individual pages.

Linking to Pictures

With great functionality comes great limitation. Hexo 3 is unhappy with me linking to pictures in the post asset folder using plain Markdown syntax such as ![front of top PCB](ESP8266.jpg). The picture displays fine when looking at a single article, but it links to wrong places on the blog homepage.

Hexo suggested using a "tag plugin" and reference the picture like { % asset_img ESP8266.jpg front of top PCB %}, but it's probably a bad idea. First, I have so many articles with the existing perfectly valid syntax. Second, what if I have to move away from Hexo someday? I don't want to re-edit every article again at that time.

The solution? hexo-asset-image package can convert the relative paths to absolute paths.

Scoped Stylesheets

As a primarily technical blog, I have several articles that require custom stylesheets. I used to use <style scoped> elements to limit the effect of a stylesheet within an article, but I learned recently that <style scoped> is deprecated. It's said that "the necessity of <style scoped> is decreased by nesting available in SASS and LESS". While I'm not ready to move to a new way to generating CSS, this gives me an idea:

  1. <style scoped> can stay in my source Markdown/HTML files.
  2. An after_post_render script extracts the stylesheet of each article, and prepends the "id" of the article in front of each CSS selector using reworkcss.
  3. The template collects the reworked CSS of each article appearing on a generated page, and inserts them at the top of the page.

Is Hexo Fast?

Hexo, like every other static site generator, describes itself as "fast". However, in my experience, Hexo is nowhere near "fast". The /study sub site and the blog both have around 450 files in the generated site, but Jekyll takes less than 3 seconds to generate /study from scratch, while Hexo needs more than 20 seconds even if the site is already compiled.

Two optimizations I've done are:

  • Merging hexo-excerpt, hexo-asset-image, and my <style scoped> handling function into a single script, so that the HTML is parsed into DOM only once instead of three times.
  • Disabling Hexo's built-in code highlighting feature and use highlight.js instead.

This cuts the generation time down to about 10~15 seconds. It's still much slower than Jekyll, but better.

Configuration and Scripts in Git

NGINX configuration files are checked into the same repository as the main site. I learned the importance of committing configuration into source control in a hard way: a hard drive at school failed, and I discovered that I couldn't resume certain experiments in my dissertation research because I only have the program source code in git and the datasets in backup machines, but I don't have the configuration to bootstrap those experiments! I spent a whole week rewriting the configuration, so I'd make sure not to repeat the same mistake in my website.

Steps to install the website also need to appear in git. While there isn't much safety concern because I should be able to remember or quickly find out what software I should install on a server and where to copy the website files, it is just easier to be able to copy these from a README.md file.

Also included is a bash script to test the redirects (such as those in /study sub site) are configured correctly. Yes, I have a "test case" for my personal website.

Finally, there's an uploading script, which rsync the whole website to my VPS.

Conclusion

So that's my experience in putting yoursunny.com into git, and rebuilding parts of the website using Jekyll and Hexo static site generators. The website source code is definitely cleaner and more modern, and I won't worry about losing it. I plan to continue publishing my website until the end of my life, or the disappearance of the Internet, whichever is earlier.

Tags: Hexo Jekyll PHP git web