Deep Atlantic Storage: Rewriting in Rust

I have been coding in C++, Go, and TypeScript for many years, but recently I started learning the Rust Programming Language.

Why Rust

I choose to study the Rust programming language because:

  • Rust is one of the major system languages that I do not know.
  • Rust, just like Go, is a memory safe language, apparently now a national security issue.
  • Rust, unlike Go, does not have a garbage collector.
  • Rust, unlike Go, does not have a nil pointer.

The last point is particularly important. In Go, I often find myself writing a method with pointer receiver:

type Counter struct {
    V int
}

func (cnt *Counter) Increment() {
    cnt.V += 1
}

The pointer receiver is required, if the method needs to modify a field on its receiver. However, more often than not, I forget to check whether the receiver itself is a nil pointer, even in production code. If the above function was called like this:

var cnt *Counter
cnt.Increment()

It would experience a panic at runtime:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x46f383]

In contrast, Rust's rigid rules can prevent writing such careless code. A function/method is allowed to modify its arguments, including self, only if the argument is passed as a mutable borrow, signified with the mut keyword. Moreover, a borrow can only be created from a real value, allocated in either the stack or the heap. It is generally impossible to have a nil or NULL pointer, or a borrow that points to invalid memory. This different, as I perceive, elevates Rust to have a higher level of memory safety.

The Book

The official Learn Rust webpage gives three options:

  • The Rust Programming Language, "the book" that covers the language principles and some small exercises.
  • Rust By Examples, a bunch of code with minimal explanation.
  • Rustlings, an interactive tutorial of the Rust syntax in my environment.

Two decades ago, I learned HTML and CSS from the official W3C specifications. While I would not go as deep as learning from the language and compiler specifications, I chose the deepest option among the three, "the book".

During the 43-day government shutdown, I studied the book on each weekday. I setup my local environment, pasted the sample snippets, and made small changes to see what would happen.

I feel that, Rust is a difficult language to learn. Not in a university taking programming class, I don't have a professor and teaching assistants to assign my homework, grade my code, and answer my questions. I am mostly on my own, against the Visual Studio Code console, judged by the compiler output when I invoke the cargo run command.

The last chapter of the book is building a multithreaded web server. I immediately noted that, the exercise is making me handle the network traffic at TCP level. In contrast, Go has a built-in net/http package, and I used Win32 HTTP Server API for my hProxyN project in college. I'm surprised that Rust, being a modern programming language, lacks an HTTP server in its standard library.

Here I learned a different philosophy: Rust is designed with a minimal standard library that only contains the essentials. Most "batteries" are provided as crates, including random number generation, command line parsing, Async I/O, HTTP server, and many others.

Re-build the Deep Atlantic Storage app

As I finish the book, I wanted to build something real. While I had a few new project ideas, I decided against building a completely new project, using a language that I barely know, because it would be too complicated to architect the software, design the protocols, and deal with an unfamiliar language, all at the same time. Instead, I wanted to re-build something existing, but better.

Many learners would choose a tutorial project, often seen on DEV #clone tag. However, I feel wasteful building these projects, because I do not see myself ever deploy and operating the resulting program, which means the code would be written once and never used again. Instead, I wanted to build something that I would deploy, but with low stakes.

Looking over my website, now in its 20th year, I found the perfect candidate: Deep Atlantic Storage, a wacky webpage I made on July 4th holiday in 2021. It has fewer than 300 lines of code, originally written in JavaScript and Go, encompassing many aspects of system programming: file access, stdio usage, byte slices, bit operators, HTTP servers. I shall re-build this application, in Rust.

The Process and What I Learned

The codebase is hosted on GitHub: yoursunny/summer-host-storage. It took me four days to build the initial version.

On my first day, I established the package structure and enabled GitHub Actions, even before writing any code logic. Then, I wrote these functions along with unit tests:

use std::io;

// Download a file from the Atlantic Ocean.
pub fn download<T: io::Write>(mut w: T, counts: &BitCounts) -> Result<(), io::Error> {}

// Upload a file to the Atlantic Ocean.
pub fn upload<T: io::Read>(r: T) -> Result<BitCounts, io::Error> {}

I tried a few command line parsing crates on my second day and picked clap. I wrote the necessary declarations, to make the download and upload functionality available on the command line.

I spent the third day researching HTTP server crates. I initially used hyper but was very confused on its body concept and had great difficulty writing a test case. Then I switched to axum, which seems to have more natural APIs similar to what I used in Go, Node.js, and Arduino.

However, I hit a snag when I tried to integrated my download and upload functions: I wrote these as synchronous functions, so that the HTTP server must buffer the entire file in memory during each download or upload. The io::Write and io::Read traits feel like streams, but they are not the same streams as Node.js or Go.

On my fourth day, I transformed the core functions to be asynchronous:

use tokio::io::{self, AsyncRead, AsyncWrite};

// Download a file from the Atlantic Ocean.
pub async fn download<T: AsyncWrite + Unpin>(w: T, counts: &BitCounts) -> Result<(), io::Error> {}

// Upload a file to the Atlantic Ocean.
pub async fn upload<T: AsyncRead + Unpin>(r: T) -> Result<BitCounts, io::Error> {}

This refactoring enabled me to unclog the HTTP pipes and realized request and response streaming, finishing the build.

My main takeaway from this process is that, for network applications, I should use the industry standard tokio crate and build as asynchronous APIs, which would perform better than the standard library traits.

Final Words

I attended Rust DC meetup and briefly introduced my app. Audience was interested. During the same meetup, I learned about another command line parsing library bpaf and decided to make the switch. The code became a few lines shorter as its syntax is less verbose.

I haven't deployed the app yet, because it still lacks a few important features such as the gzip compression that is available in the Deep Atlantic Storage app written in Go. I shall further refine my first Rust application and get it read for deployment.