The last word in static site generator for me
2022-10-19This article describes the how and why I think that my current solution for generating my website will fulfill all my needs
Motivation
My experience from more than eight years of (irregular) blogging and
maintaining a personal web page has lead to some structural patterns for
my sites in common. For example, most of the time my personal site
contains a navigation bar, some (invariant) pages (ie. about,
teaching, contact), and a blog page (aka index of posts). For a long
time, I've been using static site generators according to my preferred
programming language at that time. I was using hugo
(feel free to
guess the language) as the latest iteration of site generator. However,
using hugo
for such simple site structure is kind of overkill IMO. So,
let's try to go simple with basic Unix tools.
There are numerous motivation from other people why they have decided to build their sites with an own (homebrewn) solution. The central aspect for me is that you have full control over the workflow from source to deploy. From experience, it even doesn't play a big role what 'project' you want to build. With the same approach you're able to build a thesis, a presentation plus handouts, and a blog. How is that possible, you may ask. These kind of projects share a common pattern that basically describes a transformation from a given source format (markdown, LaTeX, org, reST) to a target format (PDF, html, reactjs, beamer).
What is left is to select the tools that fits best for the job. For
example, karl.berlin
has decided for a shell
-based approach, whereas
technomancy is using
a make
-based approach to cover the overall
orchestration of their tools. For the conversion from one format to
another there are even more possible solutions. I personally
continuously through pandoc
into the mix when it comes
to the conversion between different formats. For generating an atom
feed or processing images there is also a vast amount of tools to
choose from
(m4
, imagemagick
, inkscape
, tikz
). Which
brings us to another important point.
You decide how much software bloat is good for you, which
includes both the tools and the content. For example for some projects
even
pandoc
is too heavy weight or not supported on your
platform (unlikely), then you would like to prefer a more (but
limited) approach like lowdown
when markdown is the
single source file type you want to process. Talking about content,
there is no magic boilerplate code buried in some html partial from
an external theme or framework if you don't want to use it. Of course,
you are free to link as many external sources (ie. javascript) as you
like or structure your html in a "clever" way, no one will stop
you. But, there are good reasons especially for low-bandwidth regions
to strip down your page by similar keeping the relevant
information. If you're looking for some advice into that direction
aptivate
has you covered and effectively leads to some
250kb or
textonly websites
for great good.
Even if I'm convinced that everybody should shape his/her own workflow, I would like to share a combination that works best from my experience (ie. my projects done so far). Most of the time, I'm using a combination of the following tools in one way or another:
make
for the overall build orchestrationpandoc
for content conversion (in case I don't use plain html)pikchr
for generating svg imagesimagemagick
for modifying raster images- various unix tools for processing text files
Within the remainder of the article I'll describe some aspects of my workflow. Be aware of that every design decision comes with a trade off. What follows is, therefore, either a silver bullet nor a dogmatic approach blindly following some greater ideas of software design (ie. KISS, DRY, suckless etc.). It is meant to be a (practical) starting point for your own approach.
A Workflow for Generating my Personal Website
The most generic operation my workflow foresees is to copy a file as-is from source to target directory. This kind of "catch all" target gets executed whenever there is no more specific rule. As an example you can think about a css stylesheet, favicon, or even images that are not meant for further processing.
build/%: source/%
cp $< $@
A slightly more complex rule is to actually deal with the content of
files. In case of html output, you want to combine an invariant part of
a html file (e.g. header, footer) with a changing body part. Assuming
you write plain html files, the following example uses cat
to combine
header, footer and body to one html output file.
build/%.html: source/%.html
cat header.html $< footer.html > $@
In case you need to process the files before combining them to a final
result, the former rule can be extended by your preferred processing
tool. The following example uses pandoc
in standalone mode to produce
a html file.
build/%.html: source/%.md
pandoc -standalone -input $< -output $@
So far we've looked at files that are "predefined" in terms of content
which means the files get used as-is. But what if we want to produce the
content on-the-fly? This is, for example, the case when we need a index
page for all our posts. That is more complex because processing includes
a dynamic aspect. What works well for me is using a simple database
(actually a tsv
file). Each entry consists of a line with the
following fields:
updated directory title summary tags
Each line describes one post by defining when it was updated/published,
where to find the content, the title, a summary, and (optional) tags.
Attentive readers recognize here some relation to well-known yaml code
blocks of metadata in markdown files. Indeed, that file is a
consolidated list of metadata. In the past I was used to store the
meta-information of each post close to the content itself (ie. same
directory), but that introduced another extraction and consolidation
step to produce such database for all my posts. An overhead I'm keen to
ditch even if I now need to maintain the file by hand. That file-based
database has a nice bonus, I can use unix tools for reading and
processing the entries. Now, to create a basic listing of all my posts
in html I'm using awk
like:
BEGIN{ FS='\t'; print "<ul>" }
{
print "<li>" $1 "<a href=./" $2 "/index.html>" $3 "<\a><\li>"
}
END{ print "<\ul>" }
That list gets created on-the-fly and is relevant for time until the
final index page gets generated. An ideal case for applying an
intermediate
target and let make
delete the files after creation.
.INTERMEDIATE: build/posts.html
build/posts.html: posts.tsv
./ul $< > $@
build/blog/index.html: build/posts.html
cat header.html $< footer.html > $@
You could argue that this is an overhead and why are we not writing the posts index page by hand? A valid point and my answer is that I want to use the same information (i.e. single point of truth) about my posts at another place in the processing, namely for creating an atom feed.
From my experience (i.e. writing my own blog engine in golang) creating
an rss feed is one of the most vital parts of a blog for two reasons.
First, RSS is still a way to syndicate information over the wire and let
the reader decide how to read it (e.g. terminal newsreader, app, web
etc.). Second, it is not playing inline with the previous tools used in
processing of html files even if the standards (XML and HTML) are close
to each other. Let me elaborate a little bit on the latter point by
using an example. Basically, what you get for a single post are meta
infos like title, date, and summary. Wouldn't it be great if we can
reuse these information as often we need during our processing by
similar keep our set of tools constant? For example, using pandoc
to
generate html files and an atom feed out of a list of posts. In several
attempts I collect all the metainformation from source, polished it and
put it together in one big file that could be "understood" by pandoc
to create a (not supported) xml file (i.e. I "convinced" pandoc
that
the output is html). So, I ended up in some hackish approach that
utilizes pandocs
metainfo capabilities in a way it was not designed
for. I'm pretty sure that sooner or later I'll forget about my "clever"
trick at that point in time and maintenance/debugging becomes a
headache.
So my solution tries to build the bridge between maintainabilty (i.e. I
understand what the thing is doing even if I look at it 5 years from now
on) and overhead (i.e. do you really need xslproc
). We already have
the database with our consolitdated meta information for all posts in
place. What is missing so far is a way to translate that information to
XML. I guess you already name it, I'm using awk
for that step as well
because I already introduced it for the sake of creating an index page
for my posts. Similar to the ul
application, feed
is reading
posts.tsv
and generates an atom.xml
file. I hereby use a (XML) feed
template, rather than HTML like the ul
application, and fill the
remaining information from the database. Which brings us to the
following target for producing a feed.
build/atom.xml: posts.tsv
./feed $< > $@
With that approach I'm introducing no other tool for generating an atom
feed into my processing workflow. Due to the fact that awk
is a
standard unix tool, data driven, and lightweight it is IMO a nice deal.
Furthermore, I can use awk
and exchange the data depending on the
use-case. For example, I'm not limited to HTML or XML output, if I would
like to produce an offline version of my posts (e.g. low-tech book) I
could use the same information and let awk
produce a LaTex formated
file output.
There exists some further targets in my Makefile that I don't want to explain in great detail because I guess you already get the idea. Further targets roundup:
deploy
copies my static site to my hosting solutionserve
spawns a local http server for testingwatch
triggers a rebuild whenever a file in source changesbuild/%.svg
creates a svg image from pikchr descriptionbuild/%.opt.jpg
uses imagemagick to downscale and greyscale the images to reduce bandwidth
Summary
It is irony that whenever I plan to write an article about my static site generator endavours, I'm procastinating by re-constructing my static site generator. As you now,
the only constant (in software) is change
the question is, is that really the last word in static site generator
for me. Yes and no, for my current need of maintaining my little
corner in the internet it is enough and I can finally move forward
with other topics. But, there are so many other interesting approaches
out there, it would be a pitty not to try them and see what their
authors had in mind. For example, there is pollen
lang
, org publish
, karl.berlins approach
with shell heredoc
that also inspires me for my own
workflow, chrisman
with a kind of m4gic
,
ssg
, and many others. And as a last word about bloat, this is just a
starting point. See make
as an "entry" to your project that can be
combined with continous integration
, replace by nix
or deployed via
docker
. The solutions are endless but how many effort do you want to
invest in generating your own static web site?
That's all for now.