Pavel Panchekha

By

Share under CC-BY-SA.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

Sub-specifications

My work on formally specifying web page layouts often has people asking how web page layouts—which after all are esthetic as much as they are functional—could possibly follow a specification in pure logic. I understand the confusion: we too often think of a program as having an end-to-end specification, and that the point of verification is to allow us to treat the program as a black box, secure in the understanding that it follows the spec. That is an unusual special case. Most programs have not one specification, but many specifications on different levels.

Implementing Fibonacci

Consider a simple example: computing the Fibonacci numbers. The specification for this function is pretty simple:

Inductive Fib : nat -> nat -> Prop :=
| fib0 : Fib 0 1
| fib1 : Fib 1 1
| fib_plus : Fib n x -> Fib (S n) y -> Fib (S (S n)) (x + y)

Then you go ahead and implement some function fib and prove that it works by proving Fib n (fib n).11 Maybe you also go and prove some facts about the spec, like making sure that Fib is a functional relation. This works,22 The induction principle might be a little tricky. and you can now ignore fib and reason directly about the Fib specification.

But consider that there are lots of ways to implement the Fibonacci function: plain recursion, memoization, an array, a two-variable loop, the matrix method, the implicit matrix method, or a number field method. The specification I proved above tells us that we correctly implemented the Fibonacci function, but it doesn't tell us that we correctly implemented, say, the implicit matrix method.

What can be confusing here is that the implicit matrix method satisfies the same end-to-end specification as even the naive Fibonacci function.33 It has different asymptotics, but this isn't enough to distinguish, say, the implicit matrix from the number field methods. Instead, it has a different specification at a different layer: an implementation of implicit matrix Fibonacci has a data structure for representing implicit matrices (the representation can be verified), for adding and multiplying them (the operations can be verified), for implementing fast exponentiation (perhaps verified against the specification for multiplication), and so on. And of course you would also need to prove that the matrices chosen correctly implement the Fibonacci function.

These intermediate specifications are probably not accessible to callers of the Fibonacci function, but they are accessible to different parts of the Fibonacci implementation. In fact this implicit matrix Fibonacci function has a rich internal structure, with a lower level implementing implicit matrices and a higher level implementing Fibonacci, and the lower level ought to be opaque to the higher level.

Designer choices

Given this richly-structured set of specifications, verification can offer many tools even when it cannot do anything for the highest-level goals like attractiveness: lower-level specifications can often be formalized even the high-level ones can't. A good example of this is the Neutrons project at UW,44 Done by my very talented colleagues Stuart Pernsteiner, now at Galois, and Calvin Loncaric, now at Oracle, with a lot of the UW PLSE faculty advising them. the goal of which was to formally verify the software that control's UW Medicine's neutron therapy machine. No one knows enough about the world to prove that the machine cures cancer,55 Let alone “often enough”. but it is definitely possible to prove that the beam turns on and off when it's supposed to, and that's definitely part of the plan the neutron therapy machine's designers developed for how the machine ought to work. Neutrons combines multiple formally-verified static analysis tools with a very domain-generic Alloy framework for combining these tools together. Here Alloy plays the role of allowing less-formal reasoning where each sub-specification an be proven not only using different tools but using different models of the world and different assumptions.

In web page layouts, the same pattern occurs. The overall goal of the page involves things like attractiveness, usability, certain functionality, and accessibility. This is the high-level, end-to-end specification—unfortunately, it is, as far as we know now, unformalizable. To meet that specification, the designer makes certain decisions, like the choice to have a two-column layout with toolbars and a footer. These decisions aren't yet a program: they are a new, more refined specification. To implement them, there are yet further lower-level specifications: the footer has links on the left in three columns and the address and contact info on the right; there are three toolbars, corresponding to "store-wide", "department-wide", and "item-specific" actions; and so on. These levels go lower and lower, until you have specifications like "enticing color" and a concrete implementation like a particular shade of blue from a color palette. Again, even when the highest-level decisions are difficult to review or understand, the lower-level ones can be verified, and mistakes often caught.

Sub-specifications in synthesis

A useful perspective is to think of writing a program (whether manually or using program synthesis) as the task of developing finer and finer sub-specifications, until the specifications get as granular as "increment x". This perspective explains how even large programs can be synthesized by humans, even though there are a near-infinite number of even 100-line programs, let alone the million-line monstrosities frequently written by teams: instead of writing a million lines of code at once, people write a dozen lines of specification at a time, thousands of times.

This idea is being pursued more and more in automated program synthesis as well. N. Polikarpova and I. Sergey's recent SuSLik paper is a good example: to synthesize mutating, heap-manipulating programs, SuSLik starts with a high-level specification for the behavior of the program, and then transforms the specification using a variety of rules. Some of those rules add code to the program; others don't. N. Polikarpova's earlier Synquid project did something similar, except instead of Separation Logic for specifying heap-manipulating imperative code, it used Refinement Types for specifying functional programs.

Synthesis as planning

I've been reading classic AI papers that separate planning into “plans as execution” and “plans as communication”. When you think of plans as execution, you think about plans as programs, and using a plan as automatic. Then a plan has to have a complete model of the outside world in it, in order to respond to the complexity of the outside world, and a plan is usually very large to account for the unpredictability of the world. When you think instead of plans as communication, you stop believing that executing a plan must be purely mechanical. Since the executor is allowed to improvise and use domain-specific knowledge, the plan itself can be simpler and shorter, in an asymptotic sense.

As I read these papers, more and more I am thinking about these synthesis tools not as ways of synthesizing programs, but as ways of synthesizing useful sub-specifications. These sub-specifications are not executable—they might require further synthesis—but they communicate important information about how to accomplish a task and how to organize information. I try to imagine: what if the synthesis task was interrupted half-way through. What information could it "hand off" to a human, so as to pass along as much useful information as possible.

Useful communication

Now, what useful means there is tricky. It is always possible to refine a specification into “do the thing, then do nothing”. In purely-enumerative synthesis approaches the only measure of usefulness might be implicit, with useless specifications just failing to eventually produce a program. But in a more directed tool like SuSLik, there are a variety of ways to tell that a specification is more useful, like if it operates on sub-heaps, or less useful, like if the input heap provably does not the information necessary to produce the output heap. It might be possible to have SuSLik return, even when it fails to synthesize a program, a set of sub-specifications that it believes are possible to satisfy, for a human to fill in.66 Note that this is the opposite of sketching, where the human decomposes the high-level specification into low-level specifications and leaves the computer to do the final bit of synthesis.

I'd like to spend some more time thinking about how to tell whether a sub-specification has usefully decomposed the parent specification, or whether it has simply hidden the problem elsewhere. It's tricky. For one, there has to be some notion of termination: if the sub-specification requires solving the overall problem, it probably didn't help (though what if it established a useful precondition?). Then, there's some notion of well-foundedness: you can't turn a specification about lists of length \(n\) into special cases for length 1, length 2, and so on forever (but it can be helpful to have a few). There is a informational challenge: each sub-specification has to have enough information in its precondition to establish its post-condition. What else is there?

I think there is also an interesting challenge in how to represent specifications. Even when high-level specifications are concise, lower-level specifications often include steps like “come up with a data structure to represent X”. Those steps are hard to represent in simple logics.

Footnotes:

1

Maybe you also go and prove some facts about the spec, like making sure that Fib is a functional relation.

2

The induction principle might be a little tricky.

3

It has different asymptotics, but this isn't enough to distinguish, say, the implicit matrix from the number field methods.

4

Done by my very talented colleagues Stuart Pernsteiner, now at Galois, and Calvin Loncaric, now at Oracle, with a lot of the UW PLSE faculty advising them.

5

Let alone “often enough”.

6

Note that this is the opposite of sketching, where the human decomposes the high-level specification into low-level specifications and leaves the computer to do the final bit of synthesis.