Pavel Panchekha

By

Share under CC-BY-SA.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

Using Org-mode to Publish a Web Site

This blog is written with Org-mode from Emacs, and deployed using a git hook. Here’s my setup.

Publishing on git push

Org-mode is an Emacs outliner, markup language, personal organizer, spreadsheet, and literate programming system.11 Yes. I use it for my blog because of the rich markup language it provides. Org-mode's markup language can be compiled to HTML, and its publish system allows systematically compiling a whole interlinked website.

Since Org-mode is an Emacs package, I use Emacs in batch mode to run it, instructing it to open the index page index.org, load the list of tasks publish.el, and run the tasks (org-publish-all).

emacs -Q --batch \
    -l /tmp/www-in/etc/publish.el \
    /tmp/www-in/index.org \
    --funcall org-publish-all

Note that the index page and the list of tasks are located in www-in. That directory is a clean checkout of my website's git repository:

cd /tmp

# Copy repository and make directories
[ -d /tmp/www-in  ] || git clone /home/git/repositories/www.git /tmp/www-in 
[ -d /tmp/www-out ] || mkdir /tmp/www-out

This script is run every time I push to the git repository as a post-receive hook. If /tmp/www-in already exists, it fetches any new changes:

cd /tmp/www-in
unset GIT_DIR
GIT_WORK_TREE=/tmp/www-in/ git fetch
GIT_WORK_TREE=/tmp/www-in/ git checkout -f origin/master

The reason for explicitly setting GIT_WORK_TREE is that that variable, which git reads, is explicitly set in hooks; that’s also why I unset GIT_DIR.

After the hook copies/updates the repository to www-in, it runs the emacs invocation above and then copies the HTML output from /tmp/www-out to the web server root.

cp -r /tmp/www-out/* ~www/www

There’s one final trick involved for updating the hook itself. I keep the hook in the same repository as the website, and copy it over on every push:

cp /tmp/www-in/etc/post-receive.hook /home/git/repositories/www.git/hooks/post-receive

I make sure edits to the hook are in separate commits from edits to anything else, so that I'm always running the current version of the hook.

Compiling Org-mode to HTML

The hook above ultimately centers around calling the Emacs command org-publish-all. This command executes a list of tasks, each of which calls a different publishing function. My site has three tasks: one master task and then two child tasks for compiling Org files to HTML and for publishing static pages.

(setq org-publish-project-alist
      `(("www"
         :components ("www-pages" "www-static"))
        ("www-pages" ...)
        ("www-static" ...)))

The task for static pages is simple; it simply copies files from www-in to www-out using the org-publish-attachment command:

("www-static"
 :base-directory "/tmp/www-in"
 :base-extension "css\\|js\\|png\\|..."
 :publishing-directory "/tmp/www-out"
 :publishing-function org-publish-attachment
 :recursive t)))

The properties here should be quite self-explanatory; base-extension indicates which files to copy, and I've replaced my long list of file extensions with a ... there, but the list is quite long.22 Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.

The task for Org-mode pages is fairly similar, using the org-html-publish-to-html command instead:

("www-pages"
 :base-directory "/tmp/www-in"

 :base-extension "org"
 :recursive t

 :publishing-directory "/tmp/www-out"
 :publishing-function org-html-publish-to-html
 ...
 )

My ellipsis hides a long list of options that format the output—let's talk about that.

Styling pages in Org-Publish

The HTML Org-mode generates is ugly, but it gets the job done, and it's pretty configurable using options in the task definition.

What to show

The first group of options tells Org-mode what information to include:

:headline-level 4
:section-numbers nil
:with-toc nil
:with-author t
:with-creator nil

Most of these are self-explanatory; headline-level is how deep in the heading hierarchy to go, and "drawers" are explained below.

Since I display the author, I need to set it:

(setf user-full-name "Pavel Panchekha")
(setf user-mail-address "me@pavpanchekha.com")

Default styles and cruft

I turn off Org-mode's default styles:

:html-link-home "/"
:html-head-include-default-style nil
:html-head-include-scripts nil
:html-head-extra ,my-head-extra

Instead, I use my-head-extra:

(setf org-export-html-coding-system 'utf-8-unix)
(setf my-head-extra
      (concat
       "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">\n"
       "<link rel='stylesheet' href='/etc/main.css' />"
       ))

This way I use CSS of my choosing and turn on mobile-friendly browsing.

I also turn off the various links Org inserts into the header:

(setf org-html-home/up-format "")
(setf org-html-link-up "")
(setf org-html-link-home "")
(setf org-html-scripts "")

Drawers

I also turn on Org-mode's "drawers" in export:

:with-drawers t

Org-mode's publishes drawers as code blocks by default, which is odd. I override org-export-format-drawer-function to export them as text blocks instead:

(defun my-org-export-format-drawer (name content)
  (concat "<div class=\"drawer " (downcase name) "\">\n"
          "<h6>" (capitalize name) "</h6>\n"
          content
          "\n</div>"))
(setq org-html-format-drawer-function 'my-org-export-format-drawer)

I style these blocks to show up as large colored boxes, like this one:

Example

Hi! I’m a drawer!

The sidebar

Org-mode’s HTML compiler has a notion of a preamble and a postamble. I use the preamble to generate the sidebar that sits on the right hand side of every page:

(setf org-html-preamble t)
(setf org-html-postamble nil)

(setf org-html-metadata-timestamp-format "%d %B %Y")
(setf org-export-date-timestamp-format "%d %B %Y")
(setf org-html-preamble-format
      (list
       (list
        "en"
        (concat
         ...))))

MathJax

Org-mode uses MathJax to render inline TeX on my blog. The defaults are mostly fine, but I self-host MathJax:

(setf org-html-mathjax-options
      '((path "/etc/MathJax/MathJax.js?config=TeX-AMS-MML_HTMLorMML")
        (scale "100") (align "center") (indent "2em") (mathml nil)))
(setf org-html-mathjax-template
      "<script type=\"text/javascript\" src=\"%PATH\"></script>")

Home page tags

The simple tag system on my home page is also implemented using Org-mode.

My home page is an Org-mode file with an outline node for each blog entry. Those blog posts have Org-mode tags:

* Blog

** [[file:blog/major-key.org][Major Key (a puzzle)]]                                                  :misc:
** Age-aware Data Structures                                           :algs:
*** [[file:blog/age-aware/array.org][Age-aware Array Search]] (Part 1 of 4)
*** [[file:blog/age-aware/tree.org][Tree Lookup]] (Part 2 of 4)
** [[file:blog/stream-fusion.org][Stream Fuse Carefully]]                                                :plt:

I make sure that only the top level headings (like “Blog”) are treated as headings with OPTIONS:

#+OPTIONS: H:1 toc:nil num:nil

Finally, I use some JavaScript, which you can find in /etc/blog.js, to make the tag-chooser, and some CSS (found in main.css:216–270) to style it.

RSS

I used to generate a full RSS feed for my site using ox-rss. However, recent versions of Org-mode seem to have broken ox-rss. Plus, I like to generate full-text RSS feeds and that meant ox-rss re-exporting every blog post to the RSS file. That re-export took forever and also couldn't be cached. As a result, editing a published post took between ten and twenty minutes—I can't believe I just lived with that for years!

I now generate an RSS file by parsing the front page HTML to get a list of posts and then parsing each post's HTML to get its contents. A little Python script does all of this in seconds, and since Org-mode caches its HTML output, publishing takes a few seconds.

My Python code is quick-and-dirty: it neither parses the HTML nor uses a proper XML library to generate the RSS. Instead, everything is string templates. Luckily, I am not dealing with adversarial HTML, so this seems to work.33 It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.

Getting a list of posts

To gather the list of posts, I look for links to blog/ in the front page HTML. I'm helped by the fact that Org-mode HTML export nicely inserts newlines between different things on the page:

if "\"blog/" in line:
    link = line.split("href=")[1].split('"')[1]
    tags = [t.split('"')[0] for t in line.split('class="tag"')[1].split('class=\"')[1:]]
    posts.append((link, tags))

Here I extract both the link (from the href attribute) and also the tags for each post from the span.tag that Org-mode generates. The tags are inserted into the RSS feed, where I guess they probably help someone somewhere? I'm not an RSS power user.

The blog/ technique is the same one I use for getting a list of published posts for Magit.

Getting post content

Getting post content is even easier; I just go line by line through the file until I reach the h1, at which point the post begins:

date = title = author = None
for line in file:
    if line.startswith('<h1 class="title">'):
        break
    elif line.startswith("<header>"):
        date = line.split("time>")[1][:-2].strip()
    elif line.startswith("<title>"):
        title = line.split("title>")[1][:-2]
    elif line.startswith('<meta name="author"'):
        author = line.split("content=")[1].split('"')[1]
    else:
        pass

return (date, title, author, stopAt(file, "</body>\n"))

Note that that last return value returns the file object, wrapped in a little filter that cuts it off when the document ends. That ensures that I generate valid close tags; RSS is XML after all.

Generating RSS

With the posts and post contents collected, all that's left is putting the pieces together and dumping it to RSS. I won't replicate the code here, but it's just some nasty string interpolation. The publication date on the RSS file is computed as the last-modified date of any post:

pubDate = email.utils.formatdate(max(os.path.getmtime(file) for file, tags in posts))

I think that helps some feed readers, though I'm not sure.

Footnotes:

1

Yes.

2

Every once in a while I need to add another extension to the list. I should fix that at some point, because every time I forget I have a broken link on my site.

3

It is often not valid RSS because I embed scripts, SVGs, or IFRAMEs into the document, but RSS readers seem to handle those oddities without issue, so I'll leave it as is.