Grammars in Racket

I do most of my research in Racket (Herbie, Cassius). I work in programming languages, so most of my code manipulates programs as data. Racket is a wonderful language for this thanks to its extensive support for S-expression program representations: pattern matching, quasi-quoting, and a bevy of list manipulation functions all help manipulate programs. However, writing down type signatures or contracts for S-expression based data structures is a pain.

This post is about a helpful macro, define-by-match to make that nicer. Use like so:

(define-by-match selector?
  `*
  `(id ,(? symbol?))
  `(class ,(? symbol?))
  `(tag ,(? symbol?))
  `(and ,(? selector?) ...)
  `(desc ,(? selector?) ...)
  `(child ,(? selector?) ...))

This defines the selector? predicate, which you can also use in a contract. This predicate matches things like (id foo), *, and (and (tag div) (class content)); it's used in Cassius, my tool for reasoning about CSS.

Each of the quasi-quoted lines is one type of valid selector. In each line, the unquoted pieces, like ,(? symbol?), represent a predicate on the value in that position in the S-expression. The ellipses in the and, desc, and child branches match any number elements that match the thing before the ellipses. More generally, any syntax supported by Racket's match function can be used in the macro.

The macro implementation is very simple:

(define-syntax-rule (define-by-match name patterns ...)
  (define name
    (flat-named-contract
     'name
     (λ (var)
       (let name ([var var])
         (match var
           [patterns true] ...
           [_ false]))))))

The define-syntax-rule form is Racket's shorthand for defining macros. This macro expands into a define of a named contract. The function corresponding to that contract just contains a match form. That match returns true for any listed pattern, and false for any patterns left unlisted. The one trick is the named let, thanks to which the predicate being defined can call itself recursively in patterns.

For example, the selector? definition above expands into:

(define selector?
  (flat-named-contract
   'selector?
   (λ (var)
     (let selector? ([var var])
       (match var
         [`* true]
         [`(id ,(? symbol?)) true]
         [`(class ,(? symbol?)) true]
         [`(tag ,(? symbol?)) true]
         [`(and ,(? selector?) ...) true]
         [`(desc ,(? selector?) ...) true]
         [`(child ,(? selector?) ...) true]
         [_ false])))))

My most used type of pattern for pattern matching is the quasi-quoted list, like above. I toyed with making the quasiquote automatic—you can always escape it with a comma. But I ultimately decided that that made the code less readable, since it would break the standard that every unquote must be inside a quasi-quote. If you don't agree, feel free to make that minor tweak by adding a backtick before pattern in the macro body.

Since Racket has quite powerful pattern matching facilities, the define-by-match macro is a powerful way of writing down grammars for programs. I hope you, too, find it useful.

By on . Share it—it's CC-BY-SA licensed.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.