Verification Beyond Functions
As I am teaching Software Verification, I am leaning on an analogy that symbolic execution corresponds to expressions in a given programming language while Hoare logic corresponds to statements. It's vague but to the extent that your language has "pure" and "imperative" bits to it, symbolic execution and Hoare logic are frameworks for reasoning about them. But I've long felt that the theory beyond that level is poorly settled.
The way I see it, most programs have structure at roughly four levels:
- Expressions, which describe computation and are usually mostly pure. The prototypical example may be addition or multiplication. In an organizational analogy these are the lowest level of workers that do the actual work of an organization. A bunch of expressions are bound up in a statement.
- Statements, which describe control flow and handle mutation. The prototypical example may be a while loop. In an organizational analogy these are middle managers, overseeing the actual work and directing it. A bunch of statements are bound into a function or module.
- Functions or modules, which describe commonality, access, and isolation. The prototypical example may be a logging module or a database abstraction. In an organizational analogy this is the C-suite's role in determining which departments answer to which other departments. A bunch of functions or modules form a package.
- Packages, which describe externally-available functionality. Packages are written by independent authors and have weak links—they rarely share source control and coordinated changes are hard. The prototypical example may be a package to interface with Facebook or AWS. In an organizational analogy this is the supply chain. Packages must be collected together to make a program.
Analysis of expressions mostly means compilation to a logic of some kind, together with path conditions to determine feasibility. Analysis of statements means propagating pre-/post-conditions, like with weakest precondition generation. I understand both well enough to sketch a canonical example, write sample code, and describe key issues (function calls, termination).
However, functions and packages are, I think, under-theorized.
Functions
For functions/modules, there is some foundational work, but I don't think it's been fully synthesized.
- One half is work on generic programming and ML-style modules. These approach isolation and access through abstraction; for example, a linked-list module should have no access to the contents of the list. There's a long and important tradition here, with the crown jewel some kind of understanding of parametricity.
- The other half is work on separation logic, which describes isolation and access in effects; for example, the logging module should have no access to the database. The crown jewel here is an understanding of higher-order and concurrent separation logics using rely/guarantee reasoning.
These two approaches ought to be unified; unfortunately, they usually speak different languages. The canonical separation logic target is a low-level imperative language like C, and separating pointer access is the "standard" use case. Higher-order functions are "luxury" features. Meanwhile, the canonical modules paper imagines a higher-order programming language. Generally abstraction happens over types, not memory locations.
Some kind of separation logic for generic effects is what I imagine a synthesis to look like.
Packages
Packages are even harder to theorize. The fact that changes to an external package are impossible means analysis have to suggest work-arounds, not flag errors; the fact that packages are not under your control means analysis has to contend with the fact that they may change without warning and without trying to avoid breakage. Packages usually abstract over effects and concepts: a formally-verified package is valuable, but the spec itself relies on a universe of concepts outside the package user's ken. For example: the program that uses the AWS API likely has no internal concept of money; the AWS API itself does. So, while functions/modules glue together in a rely/guarantee fashion, there is a language barrier between packages that rely/guarantee doesn't help with.
Packages can be understood in a kind of economic sense. AWS and Azure are substitutes, and share a concept of money, which is probably alien to their users. Likewise two encryption libraries may share concepts (of security), as might two databases (of persistence). Packages rarely share an API in any meaningful sense, yet it is possible to suggest a program switch from MySQL to Postgres, and possible to talk about one database as being more distributed than another.
It's hard for me to imagine what "verification at the package level" even is. The DeepSpec effort is aiming in something like this direction, but being just one example it is not clear what to generalize and what to abstract.