What we’ll see in this series of posts#

Create our HTTP server
Read requests and display results
Respond concurrently
Transpile Markdown to HTML
How to create templates to include CSS and other elements

What is a static site generator?#

Writing blogs directly in HTML and CSS can be time-consuming, especially for complex sites. While this website itself uses very simple HTML (by design), imagine a blog with multiple links, tags, multi-level menus, etc. Things can get complicated quickly.

Static Site Generators solve this by allowing us to write posts in Markdown, which is simpler than HTML. The generator then converts them into HTML and applies styling with CSS. Static pages are delivered to the client without any server-side logic beyond serving content.

Popular Open-Source Options#

Serum
Franklin
Gatsby
Hexo
Eleveny
Pelican
Zola
Hugo (this website)
Jekyll

Mine is a simple project called Personal.

Personal#

This project lasted about a week, inspired by boot.dev and ThePrimegen, where they push you to build interesting projects from scratch. I decided to do it with zero dependencies for added challenge.

By building it myself, I gained an appreciation for open-source libraries that handle these problems efficiently.

HTTP 1.1 Server#

As we’ve said, we’re going to program this from scratch. So the first step is to open sockets. We’ll use the gen_tcp module, which will give us everything we need to set up our server.

I won’t go into much detail about how an HTTP server works, but the first thing we need to do is bind to a port and obtain our “Listen Socket”

 1  defp accept(port) do
 2    {:ok, listen_socket} = :gen_tcp.listen(
 3      port,
 4      [
 5        :binary,
 6        packet: :line,
 7        active: false,
 8        reuseaddr: true,
 9        nodelay: true,
10        backlog: 1024
11      ]
12    )
13
14    Logger.info("Listening port: #{port}")
15
16    loop(listen_socket)
17  end

This interacts with the TCP/IP stack where a buffer of 1024 spaces is created for pending connections (backlog). To understand this a bit better, every time someone makes a request to our port, it is stored in the buffer waiting to be accepted. Reducing or increasing this number dramatically affects throughput. (You can run benchmark tests and you’ll see the difference :3)

As I just explained, we have a buffer, but now we need to accept these connections one by one, which will be removed from this buffer. To do this, we need to execute the following code in an infinite loop.

 1  defp loop(listen_socket) do
 2    case :gen_tcp.accept(listen_socket) do
 3      {:ok, socket} ->
 4        Personal.Worker.work(socket)
 5
 6      {:error, reason} ->
 7        Logger.error("Failed to accept connection #{inspect(reason)}")
 8    end
 9
10    loop(listen_socket)
11  end

It’s very important to consider concurrency at this point because if we think about it, we have a function running in an infinite loop with :gen_tcp.accept/1 and Personal.Worker.work(socket), so if these functions are slow, we can imagine the massive bottleneck in accepting requests. Ideally, we want to accept connections and serve them in parallel without one blocking the other.

So at this point, we need to consider the architecture of our server. Something similar to this:

process arch

In my case, it’s not exactly like this since I don’t create a pool, I simply launch processes. The important thing to highlight here is that we have a Process called Acceptor that accepts connections, and the Worker will be responsible for creating a process per connection to serve the request.

Here we see the main code of the worker:

 1  def work(socket) do
 2    fun = fn ->
 3      case :gen_tcp.recv(socket, 0) do
 4        {:ok, data} ->
 5          {code, body} = handle_request(data)
 6          response = "#{@http_ver} #{code}\r\n#{@server}\r\nContent-Type: text/html\r\n\n#{body}\r\n"
 7          :gen_tcp.send(socket, response)
 8          :gen_tcp.close(socket)
 9
10        {:error, reason} ->
11          Logger.error("Failed to read socket socket #{inspect(reason)}")
12          :gen_tcp.close(socket)
13      end
14    end
15
16    pid = spawn(fun)
17    :gen_tcp.controlling_process(socket, pid)
18  end

Our Acceptor only executes the fun declaration, creates another process with spawn/1, where we pass the fun and tell the stack using :gen_tcp.controlling_process(socket, pid) that the socket we obtained by accepting with :gen_tcp.accept now belongs to the process we just created.

So our Acceptor can continue its loop accepting connections, and the new process will continue serving the request.

As a note, I have to add that this is not the best way to handle this problem since we raise processes without any type of control, and the fun declaration is executed at runtime, which can cause problems depending on the use case.

The real world is much more complex, and the reality is that we end up using third-party libraries. In the case of Elixir, these libraries are usually:

Ranch used by
Cowboy used by
plug_cowboy used by
Plug used by
Phoenix.
(I could continue)

Moreover, now we have new libraries like Bandit that also has it’s own dependency tree

These libraries are examples of how to work in production, but our small project serves to help you understand why these libraries exist and why people maintain them for so many years.

HTTP 1.1 Request#

Great, now we have the client’s socket where we can read and write data. Now we can work in the world of Requests.

HTTP requests are quite simple, as we obtain them in plain text where we can receive multiple lines separated by line breaks.

An example of a request without headers would be:

GET /styles/style.css HTTP/1.1\r\n

It’s now our problem to read the line, parse it correctly, and know what to do with it. In this case, we can see it’s a GET request wanting to obtain the stylesheet at the indicated path.

In our case, I’ve only implemented GET

 1  def handle_request("GET " <> rest) do
 2    path =
 3      rest
 4      |> String.split(" ")
 5      |> List.first()
 6
 7    body = FileReader.get_file(path)
 8
 9    if body == nil do
10      {"404 Not Found", ""}
11    else
12      {"200, OK", body}
13    end
14  end
15
16  def handle_request(_) do
17    {"405 Method Not Allowed", ""}
18  end

As we can see, we extract the path, look it up in FileReader, and return an appropriate response. NOTE: it’s very important how we read the data because this can generate very serious security flaws. In the case of Personal, we’ll see that it pre-caches files in memory.

To send data, we just need to follow the Response format similar to this:

1response = "#{@http_ver} #{code}\r\n#{@server}\r\nContent-Type: text/html\r\n\n#{body}\r\n"

And finally execute

1:gen_tcp.send(socket, response)
2:gen_tcp.close(socket)

These last two lines send the response and close the socket, thus ending the execution of the socket’s controlling process.

As we can see in this simple case, there are thousands of things missing here, such as header handling, different HTTP requests, etc. There’s a world of specifications to discover! rfc9110 have fun!

Getting the body!#

The objective of any HTTP server is for our client to obtain any type of data we want to make visible, but this must be done securely. Imagine someone could do something like GET /etc/passwd and our server says, “Sure, no problem, here you go…”.

To avoid this, and knowing this is a small blog, I’ll directly prevent GET requests from having to make system calls to read. To do this, when the server starts up, it will read all data from a specific folder and generate a data structure to access it.

In our case, FileReader defines a folder I’ve called static that stores all the final information our blog can offer. At the same time, the folder structure will be the same as the HTTP request structure.

For example, if our structure looks like this:

static/
├─ images/
├─ styles/
│  ├─ style.css
├─ blog/
├─ index.html

To obtain style.css or index.html, the requests should be:

GET /styles/style.css HTTP/1.1\r\n
GET / HTTP/1.1\r\n

In the case of FileReader, it builds a Map where each key is a folder and the files inside this folder have the filename as the key and the file content as the value.

 1%{
 2  "static" => %{
 3    "images" => %{
 4      "sample.webp" => <<82, 73, 70, 70, 214, 10, 0, 0, 87, 69, 66, 80, ...>>
 5    },
 6    "index.html" => "<html>...</html>",
 7    "styles" => %{
 8      "style.css" => "/* css content */"
 9    }
10  }
11}

So if in the data variable we have the previous map, to obtain sample.webp we would execute:

1data["static"]["images"]["sample.webp"]

In my opinion, this method is quite simple. Obviously, if the script that builds this map accesses resources outside of static, we would have a problem. But apart from that, once the structure is written, it cannot be updated until the server is restarted (although it can be changed at runtime).

In my case, the structure is stored in a persistent_term which offers the fastest reads in Elixir after declaring variables directly in code.

With this, the HTTP server part ends. In the next post, we’ll see how we build HTML from Markdown and how we do the “bundle” to the “static” folder.

This has been translated to english with Claude from the original Spanish post

Static Page Generator and HTTP Server from Scratch with 0 Dependencies in Elixir