Using Org-mode to Publish a Web Site
This blog is written with Org-mode from Emacs, and deployed using a git hook. Here’s my setup.
Publishing on git push
Org-mode is an Emacs outliner, markup language, personal organizer,
spreadsheet, and literate programming system.1 [1 Yes.] I use it for
my blog because of the rich markup language it provides. Org-mode's
markup language can be compiled to HTML, and its publish
system allows
systematically compiling a whole interlinked website.
Since Org-mode is an Emacs package, I use Emacs in batch mode to run
it, instructing it to open the index page index.org
, load the list of
tasks publish.el
, and run the tasks (org-publish-all
).
emacs -Q --batch \ -l /tmp/www-in/etc/publish.el \ /tmp/www-in/index.org \ --funcall org-publish-all
Note that the index page and the list of tasks are located in www-in
.
That directory is a clean checkout of my website's git
repository:
cd /tmp # Copy repository and make directories [ -d /tmp/www-in ] || git clone /home/git/repositories/www.git /tmp/www-in [ -d /tmp/www-out ] || mkdir /tmp/www-out
This script is run every time I push to the git
repository as a
post-receive
hook. If /tmp/www-in
already exists, it fetches any new
changes:
cd /tmp/www-in unset GIT_DIR GIT_WORK_TREE=/tmp/www-in/ git fetch GIT_WORK_TREE=/tmp/www-in/ git checkout -f origin/master
The reason for explicitly setting GIT_WORK_TREE
is that that variable,
which git reads, is explicitly set in hooks; that’s also why I unset
GIT_DIR
.
After the hook copies/updates the repository to www-in
, it runs the
emacs
invocation above and then copies the HTML output from
/tmp/www-out
to the web server root.
cp -r /tmp/www-out/* ~www/www
There’s one final trick involved for updating the hook itself. I keep the hook in the same repository as the website, and copy it over on every push:
cp /tmp/www-in/etc/post-receive.hook /home/git/repositories/www.git/hooks/post-receive
I make sure edits to the hook are in separate commits from edits to anything else, so that I'm always running the current version of the hook.
Compiling Org-mode to HTML
The hook above ultimately centers around calling the Emacs command
org-publish-all
. This command executes a list of tasks, each of which
calls a different publishing function. My site has three tasks: one
master task and then two child tasks for compiling Org files to HTML
and for publishing static pages.
(setq org-publish-project-alist `(("www" :components ("www-pages" "www-static")) ("www-pages" ...) ("www-static" ...)))
The task for static pages is simple; it simply copies files from
www-in
to www-out
using the org-publish-attachment
command:
("www-static" :base-directory "/tmp/www-in" :base-extension "css\\|js\\|png\\|..." :publishing-directory "/tmp/www-out" :publishing-function org-publish-attachment :recursive t)))
The properties here should be quite self-explanatory; base-extension
indicates which files to copy, and I've replaced my long list of file
extensions with a ...
there, but the list is quite long.2 [2 Every
once in a while I need to add another extension to the list. I should
fix that at some point, because every time I forget I have a broken
link on my site.]
The task for Org-mode pages is fairly similar, using the
org-html-publish-to-html
command instead:
("www-pages" :base-directory "/tmp/www-in" :base-extension "org" :recursive t :publishing-directory "/tmp/www-out" :publishing-function org-html-publish-to-html ... )
My ellipsis hides a long list of options that format the output—let's talk about that.
Styling pages in Org-Publish
The HTML Org-mode generates is ugly, but it gets the job done, and it's pretty configurable using options in the task definition.
What to show
The first group of options tells Org-mode what information to include:
:headline-level 4 :section-numbers nil :with-toc nil :with-author t :with-creator nil
Most of these are self-explanatory; headline-level
is how deep in the
heading hierarchy to go, and "drawers" are explained below.
Since I display the author, I need to set it:
(setf user-full-name "Pavel Panchekha") (setf user-mail-address "me@pavpanchekha.com")
Default styles and cruft
I turn off Org-mode's default styles:
:html-link-home "/" :html-head-include-default-style nil :html-head-include-scripts nil :html-head-extra ,my-head-extra
Instead, I use my-head-extra
:
(setf org-export-html-coding-system 'utf-8-unix) (setf my-head-extra (concat "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n" "<link rel='stylesheet' href='/etc/main.css' />" ))
This way I use CSS of my choosing and turn on mobile-friendly browsing.
I also turn off the various links Org inserts into the header:
(setf org-html-home/up-format "") (setf org-html-link-up "") (setf org-html-link-home "") (setf org-html-scripts "")
Drawers
I also turn on Org-mode's "drawers" in export:
:with-drawers t
Org-mode's publishes drawers as code blocks by default, which is odd.
I override org-export-format-drawer-function
to export them as text
blocks instead:
(defun my-org-export-format-drawer (name content) (concat "<div class=\"drawer " (downcase name) "\">\n" "<h6>" (capitalize name) "</h6>\n" content "\n</div>")) (setq org-html-format-drawer-function 'my-org-export-format-drawer)
I style these blocks to show up as large colored boxes, like this one:
Example
Hi! I’m a drawer!
The sidebar
Org-mode’s HTML compiler has a notion of a preamble and a postamble. I use the preamble to generate the sidebar that sits on the right hand side of every page:
(setf org-html-preamble t) (setf org-html-postamble nil) (setf org-html-metadata-timestamp-format "%d %B %Y") (setf org-export-date-timestamp-format "%d %B %Y") (setf org-html-preamble-format (list (list "en" (concat ...))))
MathJax
Org-mode uses MathJax to render inline TeX on my blog. The defaults are mostly fine, but I self-host MathJax:
(setf org-html-mathjax-options '((path "/etc/MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML") (scale "100") (align "center") (indent "2em") (mathml nil))) (setf org-html-mathjax-template "<script type=\"text/javascript\" src=\"%PATH\"></script>")
Home page tags
The simple tag system on my home page is also implemented using Org-mode.
My home page is an Org-mode file with an outline node for each blog entry. Those blog posts have Org-mode tags:
* Blog ** [[file:blog/major-key.org][Major Key (a puzzle)]] :misc: ** Age-aware Data Structures :algs: *** [[file:blog/age-aware/array.org][Age-aware Array Search]] (Part 1 of 4) *** [[file:blog/age-aware/tree.org][Tree Lookup]] (Part 2 of 4) ** [[file:blog/stream-fusion.org][Stream Fuse Carefully]] :plt:
I make sure that only the top level headings (like “Blog”) are treated as headings with OPTIONS
:
#+OPTIONS: H:1 toc:nil num:nil
Finally, I use some JavaScript, which you can find in /etc/blog.js, to make the tag-chooser, and some CSS (found in main.css:216–270) to style it.
RSS
I used to generate a full RSS feed for my site using ox-rss
. However,
recent versions of Org-mode seem to have broken ox-rss
. Plus, I like
to generate full-text RSS feeds and that meant ox-rss
re-exporting
every blog post to the RSS file. That re-export took forever and also
couldn't be cached. As a result, editing a published post took between
ten and twenty minutes—I can't believe I just lived with that for
years!
I now generate an RSS file by parsing the front page HTML to get a list of posts and then parsing each post's HTML to get its contents. A little Python script does all of this in seconds, and since Org-mode caches its HTML output, publishing takes a few seconds.
My Python code is quick-and-dirty: it neither parses the HTML nor uses a proper XML library to generate the RSS. Instead, everything is string templates. Luckily, I am not dealing with adversarial HTML, so this seems to work.3 [3 It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.]
Getting a list of posts
To gather the list of posts, I look for links to blog/
in the front
page HTML. I'm helped by the fact that Org-mode HTML export nicely
inserts newlines between different things on the page:
if "\"blog/" in line: link = line.split("href=")[1].split('"')[1] tags = [t.split('"')[0] for t in line.split('class="tag"')[1].split('class=\"')[1:]] posts.append((link, tags))
Here I extract both the link (from the href
attribute) and also the
tags for each post from the span.tag
that Org-mode generates. The tags
are inserted into the RSS feed, where I guess they probably help
someone somewhere? I'm not an RSS power user.
The blog/
technique is the same one I use for getting a list of
published posts for Magit.
Getting post content
Getting post content is even easier; I just go line by line through
the file until I reach the h1
, at which point the post begins:
date = title = author = None for line in file: if line.startswith('<h1 class="title">'): break elif line.startswith("<header>"): date = line.split("time>")[1][:-2].strip() elif line.startswith("<title>"): title = line.split("title>")[1][:-2] elif line.startswith('<meta name="author"'): author = line.split("content=")[1].split('"')[1] else: pass return (date, title, author, stopAt(file, "</body>\n"))
Note that that last return value returns the file object, wrapped in a little filter that cuts it off when the document ends. That ensures that I generate valid close tags; RSS is XML after all.
Generating RSS
With the posts and post contents collected, all that's left is putting the pieces together and dumping it to RSS. I won't replicate the code here, but it's just some nasty string interpolation. The publication date on the RSS file is computed as the last-modified date of any post:
pubDate = email.utils.formatdate(max(os.path.getmtime(file) for file, tags in posts))
I think that helps some feed readers, though I'm not sure.
Footnotes:
Yes.
Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.
It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.