Pavel Panchekha

By

Share under CC-BY-SA.

Multi-command-line in Racket

Cassius, my tool for formal reasoning about web page layout, can be used in a lot of ways. You can use it to lay out web pages, much like a browser; you can use it to find bugs like overlapping text; you can use it to debug web pages; or you can use it to synthesize CSS files from examples. For each of these uses, Cassius provides a convenient command-line tool. Since most of these tools use the same back-end, these command-line tools share many options and flags.

The old way: multiple files and duplicated code

When I first started writing these tools, I settled on a convenient way to keep all these tools separate: each got its own file, where it would parse its own command-line arguments however it wanted. Then, to provide the impression that these tools were all part of a single project, I had an additional shell script called cassius, which would choose one of those tools to run. For example, if you wanted to run the synthesize tool, you could run

racket src/tools/synthesize.rkt <file> <problem>

or, you could run

cassius synthesize <file> <problem>

The second is obviously easier to remember, cleaner, and makes the similarity between different tools much more apparent.

While keeping each tool in its own file initially made sense, it forced me to duplicate a lot of code. The most important chunk of that code was in duplicating the definition of various command-line flags—there is no easy way, in Racket, to abstract over a set of flag definitions in the built-in command line library. I found that every time I added a new tool to Cassius, I ended up copying-and-pasting an existing tool and modifying it. This is terrible! Not only does it mean I duplicate a lot of code, it also meant that simple changes elsewhere in Racket often meant changing dozens of locations.

I wanted to move all of the tools into one file. This file would have a command line interface much like my cassius script: it would take a tool name and call that tool. All of the flags would be defined in one place, and each of the tools could easily share code since all were in the same file.

Limitations in the Racket command-line library

However, the Racket library for parsing command-line arguments doesn't have any mechanism for handling subcommands, like the argument synthesize in

cassius synthesize <file> <problem>

Using Racket's command-line library looks like this:

(define debug '())

(command-line
 #:program "cassius-synthesize"

 #:multi
 [("-d" "--debug") type "Turn on debug information"
  (set! debug (set-add debug (string->symbol type)))]
 [("+x") name "Set an option"
  (set-flag! (string->symbol name))]
 [("-x") name "Unset an option"
  (unset-flag! (string->symbol name))]

 #:args (file problem)
 (do-syntheize file problem debug))

The command-line invocation is what causes Racket to parse the command-line arguments and run code. That invocation receives several arguments.

First, it is passed a program name by the #:program keyword.

Next, the #:multi section lists flags; the name describes the fact that each flag can be specified multiple times. Each of these command-line flags has a list of different names (the first one can be called either -d or --debug), then a variable name to store a flag argument in (if there is a flag argument), and then a documentation string. After those three things, each flag has code to run when that flag is passed. For example, the -x variable runs code to unset a flag, while the -d flag adds to the list of debug symbols. Note that the list of debug symbols is stored in a debug variable defined just before the command-line invocation. That variable is later used when the tool is executing.

Finally, after all of the flags are specified, the code for the tool to execute is listed. To do so, an #:args keyword gives variable names to the expected command line arguments, and the code that the tool executes is given after that. Note that this code uses the variables file and problem, which are bound to command-line arguments, and also the variable debug, which was modified by the -d command-line flag. The -x and +x flags manipulate a global data structure, so they, too, affect the main tool code.

To help parse command lines, Racket enforces a strict discipline on the flags and arguments. Each flag can only begin with - or +; all flags and their arguments must precede the true command-line arguments; true command-line arguments that begin with - or + must come after a -- separator. These are all common limitations of BSD-style command lines.

Adding subcommands

The limitations in the built-in command-line library meant that if I wanted to recognize a tool name like synthesize among the arguments, I would have to make that one of the true command-line arguments. However, that would force common flags to precede the tool name, yielding ugly commands like

cassius -x optimize synthesize --prettify-values <file> <problem>

Splitting the flags into those before and after the tool name makes little sense and would be hard for users to remember.

Instead, I would like to mix the common flags with the per-tool flags, like this:

cassius synthesize -x optimize --prettify-values <file> <problem>

To do this sort of rearrangement, I wrote the first version of the multi-command-line macro:

(define-syntax (multi-command-line stx)
  (syntax-parse stx
   [(_ args ... #:subcommands [name:str subargs ...] ...)
    #'(command-line
       #:args (tool . rest)
       (match tool
         [name
          (multi-command-line
           #:argv rest
           args ...
           subargs ...)] ...))]))

I would use it like so:

(define debug '())

(multi-command-line
 #:multi
 [("-d" "--debug") type "Turn on debug information"
  (set! debug (set-add debug (string->symbol type)))]
 [("+x") name "Set an option"
  (set-flag! (string->symbol name))]
 [("-x") name "Unset an option"
  (unset-flag! (string->symbol name))]

 #:subcommands
 ["synthesize"
  #:args (file problem)
  (do-syntheize file problem debug)]
 ["verify"
  #:args (file problem)
  (do-verify file problem debug)]
 ...)

This code would expand into nested uses of command-line:

(define debug '())

(command-line
 #:args (tool . rest)
 (match tool
  ["synthesize"
   (command-line
    #:argv rest
    #:multi
    [("-d" "--debug") type "Turn on debug information"
     (set! debug (set-add debug (string->symbol type)))]
    [("+x") name "Set an option"
     (set-flag! (string->symbol name))]
    [("-x") name "Unset an option"
     (unset-flag! (string->symbol name))]
    #:args (file problem)
    (do-syntheize file problem debug))]
  ["verify"
   (command-line
    #:argv rest
    #:multi
    [("-d" "--debug") type "Turn on debug information"
     (set! debug (set-add debug (string->symbol type)))]
    [("+x") name "Set an option"
     (set-flag! (string->symbol name))]
    [("-x") name "Unset an option"
     (unset-flag! (string->symbol name))]
    #:args (file problem)
    (do-verify file problem debug))]))

This expansion uses one additional feature of command-line: you can invoke it with #:argv <expr> to ask it to evaluate expr and use the result as the list of command-line arguments. The multi-command-line macro uses this to effectively “save” the unparsed command-line flags and then “restore” them in the nested command-line invocation.

To summarize, the main tricks of the multi-command-line macro are:

  • First, parse a simple command-line with no flags, a single argument denoting the tool name, and save the rest of the arguments.
  • Branch on the tool name to determine which nested command-line invocation to use.
  • Copy the common flags into each nested invocation and parse the saved arguments.

Extra features

On top of this basic structure, I'd like to add a few extra features.

  • If you specify #:program "foo" in multi-command-line, then the subcommand "bar" will specify #:program "foo bar" in its nested command-line instance. This way, if you invoke foo --help, it will call itself foo, but if you invoke foo bar --help, it will call itself foo bar. This makes it less confusing when foo bar has more flags available.
  • If you name a tool that does not exist, you see an error message listing available tools.
  • Common flags show up in the auto-generated help even if you do not name a tool. This doesn't happen with the macro above, since the top-level command-line invocation doesn't have any flags. However, command-line allows adding extra lines to the help screen with --help, so help can be added even without declaring the arguments.
  • Tools have documentation and are listed in the top-level help screen.
  • Instead of invoking command-line, each subcommand invokes multi-command-line, allowing multiple levels of subcommands.

I've implemented some of these, because multi-command-line is also used in Herbie, my tool for automatically improving the accuracy of floating point computations. Herbie's more polished than Cassius, so it must be more user-friendly.