Using Org-mode to Publish a Web Site

This blog is written with Org-mode from Emacs, and deployed using a git hook. Here’s my setup.

Publishing on `git push`

Org-mode is an Emacs outliner, markup language, personal organizer, spreadsheet, and literate programming system.¹ [¹ Yes.] I use it for my blog because of the rich markup language it provides. Org-mode's markup language can be compiled to HTML, and its publish system allows systematically compiling a whole interlinked website.

Since Org-mode is an Emacs package, I use Emacs in batch mode to run it, instructing it to open the index page index.org, load the list of tasks publish.el, and run the tasks (org-publish-all).

emacs -Q --batch \
    -l /tmp/www-in/etc/publish.el \
    /tmp/www-in/index.org \
    --funcall org-publish-all

Note that the index page and the list of tasks are located in www-in. That directory is a clean checkout of my website's git repository:

cd /tmp

# Copy repository and make directories
[ -d /tmp/www-in  ] || git clone /home/git/repositories/www.git /tmp/www-in 
[ -d /tmp/www-out ] || mkdir /tmp/www-out

This script is run every time I push to the git repository as a post-receive hook. If /tmp/www-in already exists, it fetches any new changes:

cd /tmp/www-in
unset GIT_DIR
GIT_WORK_TREE=/tmp/www-in/ git fetch
GIT_WORK_TREE=/tmp/www-in/ git checkout -f origin/master

The reason for explicitly setting GIT_WORK_TREE is that that variable, which git reads, is explicitly set in hooks; that’s also why I unset GIT_DIR.

After the hook copies/updates the repository to www-in, it runs the emacs invocation above and then copies the HTML output from /tmp/www-out to the web server root.

cp -r /tmp/www-out/* ~www/www

There’s one final trick involved for updating the hook itself. I keep the hook in the same repository as the website, and copy it over on every push:

cp /tmp/www-in/etc/post-receive.hook /home/git/repositories/www.git/hooks/post-receive

I make sure edits to the hook are in separate commits from edits to anything else, so that I'm always running the current version of the hook.

Compiling Org-mode to HTML

The hook above ultimately centers around calling the Emacs command org-publish-all. This command executes a list of tasks, each of which calls a different publishing function. My site has three tasks: one master task and then two child tasks for compiling Org files to HTML and for publishing static pages.

(setq org-publish-project-alist
      `(("www"
         :components ("www-pages" "www-static"))
        ("www-pages" ...)
        ("www-static" ...)))

The task for static pages is simple; it simply copies files from www-in to www-out using the org-publish-attachment command:

("www-static"
 :base-directory "/tmp/www-in"
 :base-extension "css\\|js\\|png\\|..."
 :publishing-directory "/tmp/www-out"
 :publishing-function org-publish-attachment
 :recursive t)))

The properties here should be quite self-explanatory; base-extension indicates which files to copy, and I've replaced my long list of file extensions with a ... there, but the list is quite long.² [² Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.]

The task for Org-mode pages is fairly similar, using the org-html-publish-to-html command instead:

("www-pages"
 :base-directory "/tmp/www-in"

 :base-extension "org"
 :recursive t

 :publishing-directory "/tmp/www-out"
 :publishing-function org-html-publish-to-html
 ...
 )

My ellipsis hides a long list of options that format the output—let's talk about that.

Styling pages in Org-Publish

The HTML Org-mode generates is ugly, but it gets the job done, and it's pretty configurable using options in the task definition.

What to show

The first group of options tells Org-mode what information to include:

:headline-level 4
:section-numbers nil
:with-toc nil
:with-author t
:with-creator nil

Most of these are self-explanatory; headline-level is how deep in the heading hierarchy to go, and "drawers" are explained below.

Since I display the author, I need to set it:

(setf user-full-name "Pavel Panchekha")
(setf user-mail-address "me@pavpanchekha.com")

Default styles and cruft

I turn off Org-mode's default styles:

:html-link-home "/"
:html-head-include-default-style nil
:html-head-include-scripts nil
:html-head-extra ,my-head-extra

Instead, I use my-head-extra:

(setf org-export-html-coding-system 'utf-8-unix)
(setf my-head-extra
      (concat
       "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n"
       "<link rel='stylesheet' href='/etc/main.css' />"
       ))

This way I use CSS of my choosing and turn on mobile-friendly browsing.

I also turn off the various links Org inserts into the header:

(setf org-html-home/up-format "")
(setf org-html-link-up "")
(setf org-html-link-home "")
(setf org-html-scripts "")

Drawers

I also turn on Org-mode's "drawers" in export:

:with-drawers t

Org-mode's publishes drawers as code blocks by default, which is odd. I override org-export-format-drawer-function to export them as text blocks instead:

(defun my-org-export-format-drawer (name content)
  (concat "<div class=\"drawer " (downcase name) "\">\n"
          "<h6>" (capitalize name) "</h6>\n"
          content
          "\n</div>"))
(setq org-html-format-drawer-function 'my-org-export-format-drawer)

I style these blocks to show up as large colored boxes, like this one:

Example

Hi! I’m a drawer!

The sidebar

Org-mode’s HTML compiler has a notion of a preamble and a postamble. I use the preamble to generate the sidebar that sits on the right hand side of every page:

(setf org-html-preamble t)
(setf org-html-postamble nil)

(setf org-html-metadata-timestamp-format "%d %B %Y")
(setf org-export-date-timestamp-format "%d %B %Y")
(setf org-html-preamble-format
      (list
       (list
        "en"
        (concat
         ...))))

MathJax

Org-mode uses MathJax to render inline TeX on my blog. The defaults are mostly fine, but I self-host MathJax:

(setf org-html-mathjax-options
      '((path "/etc/MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML")
        (scale "100") (align "center") (indent "2em") (mathml nil)))
(setf org-html-mathjax-template
      "<script type=\"text/javascript\" src=\"%PATH\"></script>")

Home page tags

The simple tag system on my home page is also implemented using Org-mode.

My home page is an Org-mode file with an outline node for each blog entry. Those blog posts have Org-mode tags:

* Blog

** [[file:blog/major-key.org][Major Key (a puzzle)]]                                                  :misc:
** Age-aware Data Structures                                           :algs:
*** [[file:blog/age-aware/array.org][Age-aware Array Search]] (Part 1 of 4)
*** [[file:blog/age-aware/tree.org][Tree Lookup]] (Part 2 of 4)
** [[file:blog/stream-fusion.org][Stream Fuse Carefully]]                                                :plt:

I make sure that only the top level headings (like “Blog”) are treated as headings with OPTIONS:

#+OPTIONS: H:1 toc:nil num:nil

Finally, I use some JavaScript, which you can find in /etc/blog.js, to make the tag-chooser, and some CSS (found in main.css:216–270) to style it.

RSS

I used to generate a full RSS feed for my site using ox-rss. However, recent versions of Org-mode seem to have broken ox-rss. Plus, I like to generate full-text RSS feeds and that meant ox-rss re-exporting every blog post to the RSS file. That re-export took forever and also couldn't be cached. As a result, editing a published post took between ten and twenty minutes—I can't believe I just lived with that for years!

I now generate an RSS file by parsing the front page HTML to get a list of posts and then parsing each post's HTML to get its contents. A little Python script does all of this in seconds, and since Org-mode caches its HTML output, publishing takes a few seconds.

My Python code is quick-and-dirty: it neither parses the HTML nor uses a proper XML library to generate the RSS. Instead, everything is string templates. Luckily, I am not dealing with adversarial HTML, so this seems to work.³ [³ It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.]

Getting a list of posts

To gather the list of posts, I look for links to blog/ in the front page HTML. I'm helped by the fact that Org-mode HTML export nicely inserts newlines between different things on the page:

if "\"blog/" in line:
    link = line.split("href=")[1].split('"')[1]
    tags = [t.split('"')[0] for t in line.split('class="tag"')[1].split('class=\"')[1:]]
    posts.append((link, tags))

Here I extract both the link (from the href attribute) and also the tags for each post from the span.tag that Org-mode generates. The tags are inserted into the RSS feed, where I guess they probably help someone somewhere? I'm not an RSS power user.

The blog/ technique is the same one I use for getting a list of published posts for Magit.

Getting post content

Getting post content is even easier; I just go line by line through the file until I reach the h1, at which point the post begins:

date = title = author = None
for line in file:
    if line.startswith('<h1 class="title">'):
        break
    elif line.startswith("<header>"):
        date = line.split("time>")[1][:-2].strip()
    elif line.startswith("<title>"):
        title = line.split("title>")[1][:-2]
    elif line.startswith('<meta name="author"'):
        author = line.split("content=")[1].split('"')[1]
    else:
        pass

return (date, title, author, stopAt(file, "</body>\n"))

Note that that last return value returns the file object, wrapped in a little filter that cuts it off when the document ends. That ensures that I generate valid close tags; RSS is XML after all.

Generating RSS

With the posts and post contents collected, all that's left is putting the pieces together and dumping it to RSS. I won't replicate the code here, but it's just some nasty string interpolation. The publication date on the RSS file is computed as the last-modified date of any post:

pubDate = email.utils.formatdate(max(os.path.getmtime(file) for file, tags in posts))

I think that helps some feed readers, though I'm not sure.

Footnotes:

Yes.

Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.

It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.

By Pavel Panchekha

14 May 2011

Using Org-mode to Publish a Web Site

Publishing on `git push`

Compiling Org-mode to HTML

Styling pages in Org-Publish

What to show

Default styles and cruft

Drawers

Example

The sidebar

MathJax

Home page tags

RSS

Getting a list of posts

Getting post content

Generating RSS

Footnotes:

Using Org-mode to Publish a Web Site

Publishing on git push

Compiling Org-mode to HTML

Styling pages in Org-Publish

What to show

Default styles and cruft

Drawers

Example

The sidebar

MathJax

Home page tags

RSS

Getting a list of posts

Getting post content

Generating RSS

Footnotes:

Publishing on `git push`