A Software Verification Project
I just finished teaching Utah's Software Verification course, and I've been thinking how to improve the final project to be more straightforward and informative. This draft is something I might assign in the future; it is intended to be completed over multiple weeks, with regular check-ins to make sure students are on track.
Overview
This assignment involves writing a verifier for a simple programming language. The verifier converts code to verification conditions, uses Z3 to test those verification conditions, and then outputs either a counterexample if necessary. The ultimate goal is to verify Quicksort, so the verification condition uses the theories of booleans, equality, integers, and arrays (including the array property fragment). The class covers solvers for all of those theories, but this assignment uses Z3 as a black box.
The language is a subset of Python. This has two benefits. First,
programs can always be run with the normal Python interpreter, so you
don't try to verify programs that are wrong. Second, Python's ast
module provides a complete Python parser; experience shows that
parsing is a major struggle for students.
In particular, our Verified Python will allow the subset of Python where:
- Each module starts with
from verified import *
. These define some constants so that Verified Python programs are runnable as normal Python. - Modules contain a sequence of function definitions and then a "main
block", which is the usual
__name__
conditional. - The main block is exactly three lines, verbatim. The first line is
import sys
. The second line isvalues = list(map(int, sys.argv[1:]))
. The third line isprint(main(values))
. In other words, a Verified Python program takes a list of integers on the command line, runs amain
function, and prints the results. This is pretty limited but enough to test, say, sorting algorithms. This main block is the only use of slicing,sys
, attributes,map
, and so on. Since this block is always the same, it doesn't need to be type-checked. - Function definitions annotate types for every variable, those types
being
int
,bool
, andlist[int]
. No other types are allowed anywhere in Verified Python, except of course the main block. - The user may define any number of functions, but they definitely
need to define one function named
main
with signaturelist[int] -> list[int]
. - Function bodies contain a sequence of statements. Only the last
statement may be a
return
, and that statement can't be inside a condiitonal or a loop or anything like that. - The first line of a function may be a call to
PRE
. The last line may be a call toPOST
. Only first line of a while loop may be a call toINV
. Only the last line of a while loop may be a call toDEC
. IfPRE
,POST
, orINV
are omitted, they are implicitlyTrue
. IfDEC
is omitted it is implicitly0
. - Other allowed statements are assignments, while loops, if statements, and function calls.
- Function calls can only use positional arguments, and never use
*args
. - Built-in functions are those defined in
verified
module pluslen
. Calls toprint
are allowed as statements, but not as expressions. Though the main block uses other operations those operations are not legal anywhere else in the program. - Boolean operators are and, or, and not, and arithmetic comparisons
- Arithmetic operators are addition, subtraction, negation, and multiplication
- Array operators are subscripting (by a single integer: no slicing), element assignment, and the
len
function - Boolean constants and numeric constants are supported
- Inside
PRE
,POST
, andINV
, two more constructs are allowed:<=
comparison on booleans (it means implication) and theFORALL(lambda x: ...)
construct, which quantifies over integers. Additionally, insidePOST
theRETURN
constant is allowed, which represents the return value of the function. - Conditions in
while
loops are further restricted, in that they may only be a single arithmetic comparison on constants and variables. - No expressions besides the above, variables, and literals are allowed.
The verified
module contains:
from typing import * def PRE(cond: bool): pass def POST(cond: bool): pass def INV(cond: bool): pass def DEC(val: int): pass RETURN = None def FORALL(fn: Callable[[int], bool]): return True
Since PRE
, POST
, INV
, DEC
, and FORALL
do nothing, it is safe to run a
Verified Python program in ordinary Python. However, these functions
are identified by Verified Python and used to generate verification
conditions.
Here's an example function in Verified Function; it computes the square of a number in a very inefficient way:
from typing import * from verified import * def square(x : int) -> int: PRE(x >= 0) s = 0 i = 0 while i < x: INV(x >= 0 and 0 <= i and i <= x and s == i * x) j = 0 while j < x: INV(x >= 0 and 0 <= j and j <= x and s == i * x + j) s = s + 1 j = j + 1 DEC(x - j) i = i + 1 DEC(x - i) return s POST(RETURN == x*x) def abs(x : int) -> int: y = x if x <= 0: y = -x return y POST((RETURN == x or RETURN == -x) and RETURN >= 0) def main(values : list[int]) -> list[int]: PRE(len(x) >= 1) x = values[0] xx = square(abs(x)) return xx POST(RETURN == values[0] * values[0]) if __name__ == "__main__": import sys values = list(map(int, sys.argv[1:])) print(main(values))
Tasks
Here's a step-by-step list of tasks that builds up Verified Python.
Start verify.py
. The first command-line argument to verify.py
should
name a file. Open that file, use the ast
module to parse it, and print
an error naming the file if parsing fails.
After parsing, check that it strictly follows the syntactic
restrictions laid out above. If it does not, print an error. Use the
lineno
and col_offset
fields on AST nodes to indicate the problem.
Check function bodies to make sure that none assign to PRE
, POST
, or
INV
, or to list
or int
or bool
(used for argument types), or to len
.
That would lead to weird and tricky code. Also check that no user
functions are called any of those names, or map
, list
, and int
, which
are used in the main block. Also check that there is a function named
main
.
Check to ensure that all variables are defined before they are used,
in the same function. We will have no global variables; every variable
will have function scope. Check that all function calls are to
functions defined in the file (or are built-ins). Also check that no
variable (either assigned or used) or function names begin with an
underscore. We'll use underscore variables for temporaries. Test that
the variable bound by a FORALL
does not shadow any existing variables.
Test that the POST
condition doesn't refer to any local variables,
only to function arguments and RETURN
.
Check to make sure PRE
, POST
, INV
, DEC
, and print
are never called
except as a statement. Check that FORALL
is never called, and RETURN
is never used, except inside PRE
, POST
, or INV
.
After all checks, use compile
to compile the AST and run it. Delete
the first command-line argument, so that the code, when run, accesses
the second and onward command line arguments, like it would be if run
directly by Python. You can test later steps by running files under
both Verified Python and normal Python.
Modify expressions to eliminate nested subexpressions. Assignments,
function calls, and conditionals contain expressions that need to be
flattened. Pay attention to correctly flattening and
and or
to
preserve their short-circuiting behavior. while
conditions do not need
to be flattened. To flatten an expression, extract subexpressions into
a new variable, named beginning with an underscores. Skip calls to
PRE
, POST
, INV
, and DEC
. The behavior of programs should not change.
Implement a type checker. For assignments, it should infer the type of
the variable based on the assigned value. For function calls, it
should check that arguments are valid against the function's
annotation. Do not type check the main block. Allow print
to be called
with any type (it is for debugging).
Write a function to convert propositions, like the contents of PRE
and
POST
-conditions and INV
-ariants, to Z3 propositions. Be careful with
FORALL
.
Generate verification conditions for function bodies containing
assignments. Write a function that takes a function definition and
returns a one-element list of Z3 propositions, with that one element
correponding to P → WP(Q)
.
For each function, generate its verification conditions, negate each
one, and pass it to Z3. If Z3 returns sat
, extract Z3's model and
print each variable's value, skipping temporary variables. Also
identify the function that failed.
Verify an "add1" function that computes one more than its input.
Add support for print
statements. They should not do anything to
verification conditions.
Generate verification conditions for conditionals.
Verify the absolute-value function.
Generate verification conditions for loops. Generally speaking, a
function with a single loop should have three verification conditions:
one for "entry", one for the "step" case, and one for "exit". The
"step" and "exit" conditions need to be universally quantified over
the state. When verifying a loop returns sat
, identify which of its
three verification conditions failed in the error message. Also print
the line number of the loop.
Write a counter, which starts at 0 and increments until it hits an
argument n
, and verify that it returns n
.
Test the squaring function. Nested loops are tricky.
Add support for measures with DEC
. The measure expression should be an
integer that starts, stays, and ends non-negative. Each iteration of
the loop should decrease it. Add a termination verification condition
to each loop.
Allow function calls. Generate verification conditions for function
calls using those functions' pre- and post-conditions. Continue to
treat calls to PRE
, POST
, INV
, DEC
, and print
specially.
For recursive calls, check that the first argument to the function is an integer and that it decreases in recursive calls.
Check that no functions are mutually recursive. That is, build the call graph, and check that the only loops are direct recursion.
Add support for arrays and the len
function, specifically the array
created by in the main block. Arrays are tricky because they
correspond to two Z3 variables: one for the array and one for its
length. Z3 arrays are infinite-length and do not have lengths. The
main
function may have a PRE
-condition specifying some kind of
constraint on the command-line input; that's valid but won't be
enforced by Verified Python.
Add support for array subscripting. Add verification conditions to check that subscripts are in bounds.
Add support for modifying array elements. Treat the assignment a[i] =
v
as a = store(a, i, v)
.
Check that arrays are are never assigned to variables, so that there is only one reference to each array. Check that no array is passed twice to a function. There are no nested arrays.
Modify the verification conditions for functions to take into account that any arrays passed to the function may have been modified. To do so, universally quantify over those arrays in the exit half of the verification condition.
Write and verify a linear search function.
Write and verify quicksort.