§ Blazing fast math rendering on the web
So, I've shifted the blog to be static-site-generated using a
static-site-generator written by yours truly. The code clocks in at around a
thousand lines of C++:
What did I gain?
- My generator is a real compiler, so I get errors on math and markdown malformation.
- I can write math that loads instantly on your browser, using no MathJax, KaTeX or any client side processing, nor the need to fetch images, which looks like this:
h(x)≡{∫i=0∞f(x)g(x)dx∑i=0∞f(x)+g(x)x>0otherwise
§ Why?
My blog is a single 9000 line markdown file ,
rendered as a single HTML page , so I
need it to compile fast, render fast, render beautiful .
Existing tools compromise on one or the other.
§ No seriously, why a single markdown file?
I need a single file to edit, so I can rapidly jot down new ideas. This is
the essence of why I'm able to log most of what I study:
because it's seamless .
Far more importantly, it provides spatio-temporal locality . I add things
in chronological order to tbe blog, as I learn thing. If I need to recall
something I had studied, go to that location in the blog
based on a sense of when .
When I do get to a location I want, the scrollbar gives me a sense of
where I am in the file. this is important to me, since it hepls me reason
spatially about what i know and what I've learnt. It's someting I love about
books, and deeply miss when navigtaing the web.I'm determined to keep this
spatio-temporal locality on my little slice of the internet.
§ Why is this awful?
As elegant as this model is to edit , it's awful for browsers to render. The
file used to take on the order of minutes for all the math to finish
rendering. MathJax (and KaTeX) painfully attempt to render each
math block. As they do, the page jumps around until everything has settled.
As this is happening, your CPU throttles, your lap or hand gets warm,
and the page is stuck. Clearly not great UX.
I still want math. What do I do? The solution is easy: Approximate the math
rendering using ASCII/UTF-8 characters! There are tools that do this ---
hevea
is one of them. Unfortunately, there is no
markdown-based-blogging-platform that uses this, so I had to write my own.
§ The cure
The solution is easy. I wrote the tool. The page you're reading it
is rendered using the tool. All the math renders in under a second because
it's nothing crazy, it's just text and tables which browsers know how to
render. No JavaScript necessary. snappy performance. Whoo!
§ The details: Writing my own Markdown to HTML transpiler.
the final transpiler clocks in at 1300Loc
of C++,
which is very small for a feature-complete markdown-to-HTML piece of code
that's blazing fast, renders math correctly, and provides error messages.
§ Quirks fixed, features gained.
I got quite a bit "for free" as I wrote this, fixing mild annoyances
and larger pain points around using github + markdown for publishing on
the web:
- I really don't want tables, but I do want the ability to write vertical bars
|
freely in my text. Unfortunately, github insists that those are tables, and completely wrecks rendering.
- I get line numbers in code blocks now, which Github Flavoured Markdown did not have.
- I get error messages on incorrectly closed bold/italic/code blocks, using heuristics that prevent them from spanning across too many lines.
- I get error messages on broken latex, since all my latex passes through
hevea
. This is awesome, since I no longer need to refresh my browser, wait for mathjax to load, go make myself tea (remember that mathjax was slow?), and then come back to see the errors.
- I can get error messages if my internal document links are broken. To be fair, my tool doesn't currently give me these errors, but it can (and soon will).
- In general, I get control , which was something I did not have with rendering directly using Github, or using someone else's tool.
§ Choice of language
I choose to write this in C-style-C++, primarily because I wanted the tool
to be fast, and I'd missed writing C++ for a while. I really enjoy how
stupid-simple C style C++ turns out to be: the C++ papers over some of C's
annoyances (like formatted output for custom types), while still preserving the
KISS feeling of writing C++.
Why not Rust? I freely admit that rust might have been a sane choice as
well. unfortunately, asking rust to treat UTF-8 string as a "ball of bytes" is
hard, when it's stupidly easy with C. Plus, I wanted to use arena-style-allocation
where I make huge allocations in one go and then don't think about memory,
something that I don't have control over in Rust. I don't have any segfaults
(yet, perhaps), thanks to UBSAN and ASAN. I find Rust to have more impedance
than C on small applications, and this was indeed small.
§ Performance
Everything except the latex to HTML is blazing fast. Unfortunately,
calling hevea
is slow, so I implemented a caching mechanism to make using
hevea
not-slow. hevea
does not have an API, so I need to fork
and
talk to its process which is understandably flow. I built a "key-value-store"
(read: serialize data into a file) with the stupidly-simple approach of writing
an append-only log into a file. hevea
is a pure function conceptally,
since on providing the same latex input it's going to produce the same HTML
output, so it's perfectly safe to cache it:
const char DB_PATH[]="./blogcache.txt";
unordered_map<ll, const char *> G_DB;
void loadDB() {
G_DB = {};
FILE *f = fopen(DB_PATH, "rb");
...
while (!feof(f)) {
ll k, len;
fread(&k, sizeof(ll), 1, f); if (feof(f)) break;
fread(&len, sizeof(ll), 1, f);
...
char *buf = (char *)calloc(sizeof(char), len + 2);
fread(buf, sizeof(char), len, f);
...
}
fclose(f);
};
const char *lookup_key(ll k) {
unordered_map<ll, const char *>::iterator it = G_DB.find(k);
if (it == G_DB.end()) { return nullptr; } return it->second;
};
void store_key_value(const ll k, KEEP const char *v, const ll len) {
assert(G_DB.count(k) == 0);
G_DB.insert(make_pair(k, strdup(v)));
FILE *f = fopen(DB_PATH, "ab");
assert(f != nullptr && "unable to open DB file");
fwrite(&k, sizeof(ll), 1, f);
fwrite(&len, sizeof(ll), 1, f);
fwrite(v, sizeof(char), len, f);
fclose(f);
}
§ For the future
I plan to rip out hevea
and write my own latex -> HTML
converter for
the subset of LaTeX I actually use . hevea
's strength is its downfall:
It can handle all of LaTeX, which means it's really slow. If I can concentrate
on a small subset, I don't need to play caching tricks, and I can likely
optimise the layout further for my use-cases.
I also want colored error messages, because who doesn't?
I'll probably gradually improve my static site generator over time. Once it's
at a level of polish where I'm happy with it, I'll spin it out as a separate
project.
§ Conclusions
Am I glad I did it? Yes, purely because my chunk of the internet aligns with
how I want it to be, and that makes me ϵ more happy.
I think of it as an investment into future me, since I can extend the
markdown and the transpiler in the way I want it to be.