Pavel Panchekha


Share under CC-BY-SA.

Verification Beyond Functions

As I am teaching Software Verification, I am leaning on an analogy that symbolic execution corresponds to expressions in a given programming language while Hoare logic corresponds to statements. It's vague but to the extent that your language has "pure" and "imperative" bits to it, symbolic execution and Hoare logic are frameworks for reasoning about them. But I've long felt that the theory beyond that level is poorly settled.

The way I see it, most programs have structure at roughly four levels:

Analysis of expressions mostly means compilation to a logic of some kind, together with path conditions to determine feasibility. Analysis of statements means propagating pre-/post-conditions, like with weakest precondition generation. I understand both well enough to sketch a canonical example, write sample code, and describe key issues (function calls, termination).

However, functions and packages are, I think, under-theorized.


For functions/modules, there is some foundational work, but I don't think it's been fully synthesized.

  • One half is work on generic programming and ML-style modules. These approach isolation and access through abstraction; for example, a linked-list module should have no access to the contents of the list. There's a long and important tradition here, with the crown jewel some kind of understanding of parametricity.
  • The other half is work on separation logic, which describes isolation and access in effects; for example, the logging module should have no access to the database. The crown jewel here is an understanding of higher-order and concurrent separation logics using rely/guarantee reasoning.

These two approaches ought to be unified; unfortunately, they usually speak different languages. The canonical separation logic target is a low-level imperative language like C, and separating pointer access is the "standard" use case. Higher-order functions are "luxury" features. Meanwhile, the canonical modules paper imagines a higher-order programming language. Generally abstraction happens over types, not memory locations.

Some kind of separation logic for generic effects is what I imagine a synthesis to look like.


Packages are even harder to theorize. The fact that changes to an external package are impossible means analysis have to suggest work-arounds, not flag errors; the fact that packages are not under your control means analysis has to contend with the fact that they may change without warning and without trying to avoid breakage. Packages usually abstract over effects and concepts: a formally-verified package is valuable, but the spec itself relies on a universe of concepts outside the package user's ken. For example: the program that uses the AWS API likely has no internal concept of money; the AWS API itself does. So, while functions/modules glue together in a rely/guarantee fashion, there is a language barrier between packages that rely/guarantee doesn't help with.

Packages can be understood in a kind of economic sense. AWS and Azure are substitutes, and share a concept of money, which is probably alien to their users. Likewise two encryption libraries may share concepts (of security), as might two databases (of persistence). Packages rarely share an API in any meaningful sense, yet it is possible to suggest a program switch from MySQL to Postgres, and possible to talk about one database as being more distributed than another.

It's hard for me to imagine what "verification at the package level" even is. The DeepSpec effort is aiming in something like this direction, but being just one example it is not clear what to generalize and what to abstract.