## By Pavel Panchekha

### 14 May 2011

Share under CC-BY-SA.

# Using Org-mode to Publish a Web Site

This blog is written with Org-mode from Emacs, and deployed using a git hook. Here’s my setup.

## Publishing on git push

Org-mode is an Emacs outliner, markup language, personal organizer, spreadsheet, and literate programming system.1 [1 Yes.] I use it for my blog because of the rich markup language it provides. Org-mode's markup language can be compiled to HTML, and its publish system allows systematically compiling a whole interlinked website.

Since Org-mode is an Emacs package, I use Emacs in batch mode to run it, instructing it to open the index page index.org, load the list of tasks publish.el, and run the tasks (org-publish-all).

emacs -Q --batch \
-l /tmp/www-in/etc/publish.el \
/tmp/www-in/index.org \
--funcall org-publish-all


Note that the index page and the list of tasks are located in www-in. That directory is a clean checkout of my website's git repository:

cd /tmp

# Copy repository and make directories
[ -d /tmp/www-in  ] || git clone /home/git/repositories/www.git /tmp/www-in
[ -d /tmp/www-out ] || mkdir /tmp/www-out


This script is run every time I push to the git repository as a post-receive hook. If /tmp/www-in already exists, it fetches any new changes:

cd /tmp/www-in
unset GIT_DIR
GIT_WORK_TREE=/tmp/www-in/ git fetch
GIT_WORK_TREE=/tmp/www-in/ git checkout -f origin/master


The reason for explicitly setting GIT_WORK_TREE is that that variable, which git reads, is explicitly set in hooks; that’s also why I unset GIT_DIR.

After the hook copies/updates the repository to www-in, it runs the emacs invocation above and then copies the HTML output from /tmp/www-out to the web server root.

cp -r /tmp/www-out/* ~www/www


There’s one final trick involved for updating the hook itself. I keep the hook in the same repository as the website, and copy it over on every push:

cp /tmp/www-in/etc/post-receive.hook /home/git/repositories/www.git/hooks/post-receive


I make sure edits to the hook are in separate commits from edits to anything else, so that I'm always running the current version of the hook.

## Compiling Org-mode to HTML

The hook above ultimately centers around calling the Emacs command org-publish-all. This command executes a list of tasks, each of which calls a different publishing function. My site has three tasks: one master task and then two child tasks for compiling Org files to HTML and for publishing static pages.

(setq org-publish-project-alist
(("www"
:components ("www-pages" "www-static"))
("www-pages" ...)
("www-static" ...)))


The task for static pages is simple; it simply copies files from www-in to www-out using the org-publish-attachment command:

("www-static"
:base-directory "/tmp/www-in"
:base-extension "css\\|js\\|png\\|..."
:publishing-directory "/tmp/www-out"
:publishing-function org-publish-attachment
:recursive t)))


The properties here should be quite self-explanatory; base-extension indicates which files to copy, and I've replaced my long list of file extensions with a ... there, but the list is quite long.2 [2 Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.]

The task for Org-mode pages is fairly similar, using the org-html-publish-to-html command instead:

("www-pages"
:base-directory "/tmp/www-in"

:base-extension "org"
:recursive t

:publishing-directory "/tmp/www-out"
:publishing-function org-html-publish-to-html
...
)


My ellipsis hides a long list of options that format the output—let's talk about that.

## Styling pages in Org-Publish

The HTML Org-mode generates is ugly, but it gets the job done, and it's pretty configurable using options in the task definition.

### What to show

The first group of options tells Org-mode what information to include:

:headline-level 4
:section-numbers nil
:with-toc nil
:with-author t
:with-creator nil


Most of these are self-explanatory; headline-level is how deep in the heading hierarchy to go, and "drawers" are explained below.

Since I display the author, I need to set it:

(setf user-full-name "Pavel Panchekha")


### Default styles and cruft

I turn off Org-mode's default styles:

:html-link-home "/"


Instead, I use my-head-extra:

(setf org-export-html-coding-system 'utf-8-unix)
(concat
"<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n"
))


This way I use CSS of my choosing and turn on mobile-friendly browsing.

I also turn off the various links Org inserts into the header:

(setf org-html-home/up-format "")
(setf org-html-scripts "")


### Drawers

I also turn on Org-mode's "drawers" in export:

:with-drawers t


Org-mode's publishes drawers as code blocks by default, which is odd. I override org-export-format-drawer-function to export them as text blocks instead:

(defun my-org-export-format-drawer (name content)
(concat "<div class=\"drawer " (downcase name) "\">\n"
"<h6>" (capitalize name) "</h6>\n"
content
"\n</div>"))
(setq org-html-format-drawer-function 'my-org-export-format-drawer)


I style these blocks to show up as large colored boxes, like this one:

###### Example

Hi! I’m a drawer!

### The sidebar

Org-mode’s HTML compiler has a notion of a preamble and a postamble. I use the preamble to generate the sidebar that sits on the right hand side of every page:

(setf org-html-preamble t)
(setf org-html-postamble nil)

(setf org-export-date-timestamp-format "%d %B %Y")
(setf org-html-preamble-format
(list
(list
"en"
(concat
...))))


### MathJax

Org-mode uses MathJax to render inline TeX on my blog. The defaults are mostly fine, but I self-host MathJax:

(setf org-html-mathjax-options
'((path "/etc/MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML")
(scale "100") (align "center") (indent "2em") (mathml nil)))
(setf org-html-mathjax-template
"<script type=\"text/javascript\" src=\"%PATH\"></script>")


The simple tag system on my home page is also implemented using Org-mode.

My home page is an Org-mode file with an outline node for each blog entry. Those blog posts have Org-mode tags:

* Blog

** [[file:blog/major-key.org][Major Key (a puzzle)]]                                                  :misc:
** Age-aware Data Structures                                           :algs:
*** [[file:blog/age-aware/array.org][Age-aware Array Search]] (Part 1 of 4)
*** [[file:blog/age-aware/tree.org][Tree Lookup]] (Part 2 of 4)
** [[file:blog/stream-fusion.org][Stream Fuse Carefully]]                                                :plt:


I make sure that only the top level headings (like “Blog”) are treated as headings with OPTIONS:

#+OPTIONS: H:1 toc:nil num:nil


Finally, I use some JavaScript, which you can find in /etc/blog.js, to make the tag-chooser, and some CSS (found in main.css:216–270) to style it.

I used to generate a full RSS feed for my site using ox-rss. However, recent versions of Org-mode seem to have broken ox-rss. Plus, I like to generate full-text RSS feeds and that meant ox-rss re-exporting every blog post to the RSS file. That re-export took forever and also couldn't be cached. As a result, editing a published post took between ten and twenty minutes—I can't believe I just lived with that for years!

I now generate an RSS file by parsing the front page HTML to get a list of posts and then parsing each post's HTML to get its contents. A little Python script does all of this in seconds, and since Org-mode caches its HTML output, publishing takes a few seconds.

My Python code is quick-and-dirty: it neither parses the HTML nor uses a proper XML library to generate the RSS. Instead, everything is string templates. Luckily, I am not dealing with adversarial HTML, so this seems to work.3 [3 It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.]

### Getting a list of posts

To gather the list of posts, I look for links to blog/ in the front page HTML. I'm helped by the fact that Org-mode HTML export nicely inserts newlines between different things on the page:

if "\"blog/" in line:
tags = [t.split('"')[0] for t in line.split('class="tag"')[1].split('class=\"')[1:]]


Here I extract both the link (from the href attribute) and also the tags for each post from the span.tag that Org-mode generates. The tags are inserted into the RSS feed, where I guess they probably help someone somewhere? I'm not an RSS power user.

The blog/ technique is the same one I use for getting a list of published posts for Magit.

Getting post content is even easier; I just go line by line through the file until I reach the h1, at which point the post begins:

date = title = author = None
for line in file:
if line.startswith('<h1 class="title">'):
break
date = line.split("time>")[1][:-2].strip()
elif line.startswith("<title>"):
title = line.split("title>")[1][:-2]
elif line.startswith('<meta name="author"'):
author = line.split("content=")[1].split('"')[1]
else:
pass

return (date, title, author, stopAt(file, "</body>\n"))


Note that that last return value returns the file object, wrapped in a little filter that cuts it off when the document ends. That ensures that I generate valid close tags; RSS is XML after all.

With the posts and post contents collected, all that's left is putting the pieces together and dumping it to RSS. I won't replicate the code here, but it's just some nasty string interpolation. The publication date on the RSS file is computed as the last-modified date of any post:

pubDate = email.utils.formatdate(max(os.path.getmtime(file) for file, tags in posts))
`

I think that helps some feed readers, though I'm not sure.

## Footnotes:

1

Yes.

2

Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.

3

It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.