Modularity for Verdi

Table of Contents

Note

This article describes my attempt, in the summer of 2015, to develop a modular version of Verdi, a framework for verified distributed systems in Coq. The dream of a modular Verdi has now been achieved, in a more complete and interesting way than in this work, by my friends Ilya Sergey, James R. Wilcox, and Zachary Tatlock in their DiSeL system (first described at SNAPL’17 and then published in full detail at POPL’18).

This article demonstrates one alternate idea, and highlights the different approaches that initially seemed possible for modular verification of distributed systems.

Verdi is a framework for verifying distributed systems in Coq, and has been used to verify the Raft consensus protocol and a simple lock server built on top of Raft. Verdi makes it possible to build distributed systems that have an iron-clad correctness guarantee. However, building such systems is currently complex, in part due to difficulties reusing components. For example, a system might be written as a front-end which talks to a message queue, a database, and a cache, but Verdi provides no framework for separately verifying the message queue, database, cache, and front-end, and then compose their correctness guarantees into a correctness guarantee for the whole system. This post describes how to structure the verification of a distributed system to allow this modular verification form. The key is a type of specification that allows us to view distributed systems as components that can be plugged into a compositional semantics.

Specifications

Core to our technique is limiting the specification of any distributed system to a description of its input-output behavior. Such specifications hide the implementation details of a system. To formalize this style of specification, we describe specifications as a type of input, a type of output, and a predicate on traces of inputs and outputs, which classifies valid and invalid traces:

Definition trace (I O : Type) :=
  list ((I * name) + (O * name)).

Record spec : Type :=
  { sI : Type ; sO : Type ;
    valid_trace : trace sI sO -> Prop ;
    valid_nil : valid_trace [] }.

Definition trace_of (s : spec) := trace (sI s) (sO s).

In this code, a trace is a list of inputs and outputs, where each input or output is labeled by the host (or, labeled by the name of the host) at which that input or output occurs. The type name, which names hosts, is just an opaque, decidable type:

Variable name : Type.
Variable name_eq_dec : forall a b : name, {a = b} + {a <> b}.

Each spec includes a predicate valid_trace on which traces it allows, and each spec is required to count the empty trace as valid. This last condition derives from the intuition that when a system boots up, neither inputs nor outputs have occurred.

Timers

One type of system is a timeout, which can fire at any point. We can describe it by making its valid inputs the empty type and its valid outputs the unit type:

Definition timer_spec : spec :=
  {| sI := False ; sO := unit ;
     valid_trace := fun _ => True ;
     valid_nil := I |}.

Another implementation might allow the user to specify a timer

Logs

A log is something you can send things to, but which never produces output of its own:

Definition log_spec (T : Type) : spec :=
  {| sI := T ; sO := False ;
     valid_trace := fun _ => True ;
     valid_nil := I |}.

Asynchronous network

To specify an asynchronous network, we manually track a bag of messages that have been sent, and allow at any point either sending a message or receiving an already-sent message.

Variable msg : Type.

Inductive async_spec_step :
  trace (name * msg) (name * msg) -> list (name * msg * name) -> Prop :=
| async_step_O :
    async_spec_step [] []
| async_step_send :
    forall trace bag m from to,
    async_spec_step trace bag ->
    async_spec_step (inl (to, m, from) :: trace) ((from, m, to) :: bag)
| async_step_receive :
    forall trace m from to bagl bagr,
    async_spec_step trace (bagl ++ (from, m, to) :: bagr) ->
    async_spec_step (inr (from, m, to) :: trace) (bagl ++ bagr).

A trace is correct as long as there is some current bag for which the trace makes sense.

Definition async_spec : spec :=
  {| sI := name * msg ; sO := name * msg ;
     valid_trace := fun t => exists bag, async_spec_step t bag ;
     valid_nil := ex_intro _ [] async_step_O |}.

Network with drops and duplicates

Adding drops and duplication is analogous to a purely asynchronous network, again tracking a bag of messages but this time adding explicit steps to duplicate and drop messages.

Inductive net_spec_step :
  trace (name * msg) (name * msg) -> list (name * msg * name) -> Prop :=
| net_step_O :
    net_spec_step [] []
| net_step_send :
    forall trace bag m from to,
      net_spec_step trace bag ->
      net_spec_step (inl (to, m, from) :: trace) ((from, m, to) :: bag)
| net_step_receive :
    forall trace m from to bagl bagr,
      net_spec_step trace (bagl ++ (from, m, to) :: bagr) ->
      net_spec_step (inr (from, m, to) :: trace) (bagl ++ bagr)
| net_step_drop :
    forall trace m from to bagl bagr,
      net_spec_step trace (bagl ++ (from, m, to) :: bagr) ->
      net_spec_step trace (bagl ++ bagr)
| net_step_dup :
    forall trace m from to bagl bagr,
      net_spec_step trace (bagl ++ (from, m, to) :: bagr) ->
      net_spec_step trace ((from, m, to) :: bagl ++ (from, m, to) :: bagr).

Definition net_spec : spec :=
  {| sI := name * msg ; sO := name * msg ;
     valid_trace := fun t => exists bag, net_spec_step t bag ;
     valid_nil := ex_intro _ [] net_step_O |}.

Contexts

To allow a modular proof architecture, we will focus on proving specifications (that is, specifications in the restricted IO-trace sense described above) correct, given implementations of the other specifications. That is, every specification will be implemented in terms of other specifications which describe the environment that implementation will run in.11 Some implementations, like the timeout or the network, would be provided by a shim or other external source, while others, like a database, would be implemented separately.

We formalize this context as a collection of specifications indexed by a type of “socket names”:

Record ctx : Type :=
  { sockets : Type ;
    sockets_eq_dec : forall a b : sockets, {a = b} + {a <> b} ;
    specs : sockets -> spec }.

Contexts are analogous to a list of specifications; we formalize them as a map from socket names to specifications only to make working with them in Coq easier. Like lists, contexts can be consed onto and appended to.

Definition empty_ctx : ctx :=
  {| sockets := void ; sockets_eq_dec := void_eq_dec ;
     specs := void_rect _ |}.

Definition sum_eq_dec {A B}
	   (A_eq_dec: forall a b : A, {a = b} + {a <> b})
	   (B_eq_dec: forall a b : B, {a = b} + {a <> b})
	   : forall a b : A + B, {a = b} + {a <> b}.
  decide equality.
Defined.

Definition unit_eq_dec : forall a b : unit, {a = b} + {a <> b}.
  decide equality.
Defined.

Definition ctx_cons (G : ctx) (g : spec) : ctx :=
  {| sockets := (sockets G) + unit ;
     sockets_eq_dec := sum_eq_dec (sockets_eq_dec G) unit_eq_dec ;
     specs := fun (s : (sockets G) + unit) =>
		match s with
		  | inl s' => specs G s'
		  | inr _ => g
		end |}.

Definition ctx_app (G H : ctx) : ctx :=
  {| sockets := (sockets G) + (sockets H) ;
     sockets_eq_dec := sum_eq_dec (sockets_eq_dec G) (sockets_eq_dec H) ;
     specs := fun (s : (sockets G) + (sockets H)) =>
		    match s with
		      | inl s' => specs G s'
		      | inr s' => specs H s'
		    end |}.

Notation "<>" := empty_ctx.
Infix ":::" := ctx_cons (left associativity, at level 98).
Infix "+++" := ctx_app (left associativity, at level 98).

Given a context, we can collect the traces of each system in the context; such a joint trace is valid as long as the trace of each system is valid.

Variable stack : ctx.

Definition traces : Type := forall (n : sockets stack), trace_of (specs stack n).
Definition valid (w : traces) := forall (n : sockets stack), valid_trace (specs stack n) (w n).
Definition traces0 : traces := fun _ => [].
Lemma traces0_valid : valid traces0.
  intro n; apply valid_nil.
Qed.

Distributed systems are implemented by handlers, who have to respond to input and also messages from any of the systems in their context. In response, they can produce output or send messages to the systems in their context.

Record handler : Type :=
  { state : Type ; hI : Type ; hO : Type ;
    state0 : name -> state ;
    handle_msg :
      forall n : sockets stack,
	sO (specs stack n) ->
	state ->
	(forall n : sockets stack, list (sI (specs stack n))) *
	list hO *
	state ;
    handle_io :
      hI -> state ->
      (forall n : sockets stack, list (sI (specs stack n))) *
      list hO *
      state }.

Handlers provide a type for local state (and an initial state state0 for all nodes) and also types for system input and output. They must be able to respond both to inputs from the outside world (in which case they receive system input hI) also messages from each context system (in which case they receive a message of the appropriate type). In each case they produce a list of messages list (sI (specs stack n)) to send to each context system specs stack n, a list of outputs to the outside world list hO, and an updated state.

Here are some helper functions:

Variable h : handler.

Definition handler_output : Type :=
  ((forall n : sockets stack, list (sI (specs stack n))) * list (hO h) * (state h))%type.

Definition hSent (ho : handler_output) := fst (fst ho).
Definition hOutput (ho : handler_output) := snd (fst ho).
Definition hState (ho : handler_output) := snd ho.
Definition states : Type := name -> state h.
Definition io_trace : Type := trace (hI h) (hO h).

Semantics

We now define the semantics of a helper function. The semantics is defined in terms of an abstract machine, which tracks the trace of each context system, as well as the trace of global input/output and the state of each node.

Inductive step :
  (traces * states * io_trace) -> (traces * states * io_trace) -> Prop :=
| step_msg : <<step_msg_semantics>>
| step_io : <<step_io_semantics>>

Inductive step' : 
  (traces * states * io_trace) -> (traces * states * io_trace) -> Prop :=
| step'O :
    forall w st t, step' (w, st, t) (w, st, t)
| step'S :
    forall w st t w' st' t' w'' st'' t'',
      step' (w, st, t) (w', st', t') ->
      step (w', st', t') (w'', st'', t'') ->
      step' (w, st, t) (w'', st'', t'').

When the handler receives input input from the outside world at node i, it first executes handle_io h input (st i). It then updates the global state and appends sent messages and global output to the relevant traces:

forall (w : traces) (st : states) (t : io_trace) (i : name) (input : hI h),
let result := handle_io h input (st i) in
step (w, st, t)
     ((fun n' : sockets stack =>
	 (map (fun m => inl (m, i)) (hSent result n')) ++ w n'),
      (fun i' : name => if name_eq_dec i i' then hState result else st i'),
      (map (fun m => inr (m, i)) (hOutput result)) ++ inl (input, i) :: t).

The two calls to map decorate sent messages and output as output sent by the appropriate node. Note that both global outputs and the global input are added to the global IO trace.

Responding to messages from a context system is similar, but more complex:

forall (w : traces) (st : states) (t : io_trace) (i : name) (n : sockets stack) (m : sO (specs stack n)),
let result := handle_msg h n m (st i) in
step (w, st, t)
     ((fun n' : sockets stack =>
	 (map (fun m => inl (m, i)) (hSent result n')) ++ get_one i n m w n'),
      (fun i' : name => if name_eq_dec i i' then hState result else st i'),
      (map (fun m => inr (m, i)) (hOutput (handle_msg h n m (st i)))) ++ t)

Not only must sent messages but also received messages need to be added to the traces of each context system, which is done by get_one. Because that function is dependent and because I'm not so good at doing dependently-typed programming in Coq, I implemented it in tactics:

Definition get_one (i : name) (n : sockets stack) (m : sO (specs stack n)) (w : traces) : traces.
  intro n'.
  destruct (sockets_eq_dec stack n n').
  + rewrite <- e.
    exact (inr (m, i) :: w n).
  + exact (w n').
Defined.

The semantics here assume the handler can always send or receive the messages being sent or received. However, we are only interested in handlers that only send valid messages, assuming they receive valid messages.

Definition valid_handler :=
  forall (w w' : traces) (st st' : states) (t t' : io_trace),
    valid w -> step (w, st, t) (w', st', t') -> valid w'.

Notation "stack |-- handler 'HANDLER'" := (valid_handler stack handler) (at level 1).

Hypothesis valid_h : stack |-- h HANDLER.

A handler specification is just a predicate on its input-output trace which is true of the null trace. This is basically a specification, just tied to the input and output of a particular handler.

Record h_prop :=
  { property : trace (hI stack h) (hO stack h) -> Prop ; property0 : property [] }.

Variable hprop : h_prop.

Definition h_spec : spec :=
  {| sI := hI stack h ; sO := hO stack h ;
     valid_trace := property hprop ;
     valid_nil := property0 hprop |}.

Notation " [[ prop ]] " := (h_spec _ _ prop).

Notation "ctx |-- handler :::: prop" := (satisfies_spec ctx handler prop) (at level 1).

Composition

Once we have a handler which implements a specification, users of that specification can discharge that specification from their context. Intuitively, users do this by running the handler they rely upon as a slave hander, and routing messages from the user handler to the slave handler when necessary. In symbols, suppose \(\Gamma \vdash h_1 : P\), where \(\Gamma\) is the context, \(h\) is the context, and \(P\) is the property. Further suppose \(\Delta, P \vdash h_2 : Q\). Then we should be able to derive a handler \(h_2[h_1]\) such that \(\Gamma, \Delta \vdash h_2[h_1] : Q\).

In fact, this is not quite right. If the user handler sends messages to the slave handler, which responds be sending messages back, we could end up in an infinite loop with the user and slave handlers only talking to themselves but never fully handling global IO. So messages from the slave to the user handler are buffered, and a timer is used to empty the buffer:

Variable stack2 : ctx.
Definition stack2' := stack2 ::: [[ hprop ]].
Variable h2 : handler stack2'.
Variable prop2 : h_prop stack2' h2.

Definition substitute_stack : ctx := stack +++ stack2 ::: timer_spec.
Definition substitute : handler substitute_stack := <<substitute_def>>.

The joined handler \(h_2[h_1]\) takes the same input and output as the user handler, and its state is the user state, slave state, and buffer of messages from the slave to the user:

Definition substitute : handler substitute_stack :=
  {| hI := hI stack2' h2 ; hO := hO stack2' h2 ;
     state := (state stack h) * (state stack2' h2) * (list (hO stack h)) ;
     state0 := fun n => (state0 stack h n, state0 stack2' h2 n, []) ;
     handle_io := <<substitute_io>> ;
     handle_msg :=
       fun (n : sockets substitute_stack) =>
	   match n with
	     | inl (inl n') => <<substitute_msg_slave>>
	     | inl (inr n') => <<substitute_msg_user>>
	     | inr n' => <<substitute_msg_buffer>>
	   end
  |}.

When global inputs are received, we run the user handler and then run handle all messages the user handler sent to the slave. Since the states and contexts are disjoint, this operation is verbose but simple.

fun (m : hI stack2' h2) (st : state stack h * state stack2' h2 * list (hO stack h)) =>
  let sub1 := handle_io stack2' h2 m (snd (fst st)) in
  let sub2 := handle_multiple_io stack h (hSent _ _ sub1 (inr tt)) (fst (fst st)) in
  ((fun x =>
      match x return list (sI (specs substitute_stack x)) with
	| inl (inl x') => hSent _ _ sub2 x'
	| inl (inr x') => hSent _ _ sub1 (inl x')
	| inr _ => []
      end),
   hOutput _ _ sub1,
   (hState _ _ sub2,
    hState _ _ sub1,
    snd st ++ hOutput _ _ sub2)) ;

The slave processes multiple inputs with the handle_multiple_io helper, which just threads collects sent messages and threads state over multiple invocations.

Fixpoint handle_multiple_io (s : ctx) (h' : handler s)
	 (inps : list (hI s h')) (st : state s h') : handler_output s h' :=
  match inps with
    | nil => (fun _ => [], [], st)
    | inp :: inps' =>
      let sub1 := handle_io s h' inp st in
      let sub2 := handle_multiple_io s h' inps' (hState s h' sub1) in
      ((fun x => hSent _ _ sub1 x ++ hSent _ _ sub2 x),
       hOutput _ _ sub1 ++ hOutput _ _ sub2,
       hState s h' sub2)
  end.

Handling messages sent to the user handler is much like handling input:

fun (m : sO (specs stack2 n'))
    (st : state stack h * state stack2' h2 * list (hO stack h)) =>
  let sub1 := handle_msg stack2' h2 (inl n') m (snd (fst st)) in
  let sub2 := handle_multiple_io stack h (hSent _ _ sub1 (inr tt)) (fst (fst st)) in
  ((fun x =>
      match x return list (sI (specs substitute_stack x)) with
	| inl (inl x') => hSent _ _ sub2 x'
	| inl (inr x') => hSent _ _ sub1 (inl x')
	| inr _ => []
      end),
   hOutput _ _ sub1,
   (hState _ _ sub2,
    hState _ _ sub1,
    snd st ++ hOutput _ _ sub2))

Handling messages to the slave handler is even simpler, since output produced by the slave is simply added to the buffer:

fun (m : sO (specs substitute_stack (inl (inl n'))))
    (st : state stack h * state stack2' h2 * list (hO stack h)) =>
  let sub1 := handle_msg stack h n' m (fst (fst st)) in
  ((fun x =>
      match x return list (sI (specs substitute_stack x)) with
	| inl (inl x') => hSent _ _ sub1 x'
	| inl (inr x') => []
	| inr _ => []
      end),
   [],
   (hState _ _ sub1, snd (fst st),
    snd st ++ hOutput _ _ sub1))

Finally, when the timer fires, one message is taken from the buffer and given to the user handler. The rest of the code is much like any other messages for the user handler.

fun (_ : unit)
    (st : state stack h * state stack2' h2 * list (hO stack h)) =>
  match snd st with
    | nil => (fun _ => [], [], st)
    | m :: buf' =>
      let sub1 := handle_msg stack2' h2 (inr tt) m (snd (fst st)) in
      let sub2 := handle_multiple_io stack h (hSent _ _ sub1 (inr tt)) (fst (fst st)) in
      ((fun x =>
	  match x return list (sI (specs substitute_stack x)) with
	    | inl (inl n') => hSent _ _ sub2 n'
	    | inl (inr n') => hSent _ _ sub1 (inl n')
	    | inr _ => []
	  end),
       hOutput _ _ sub1,
       (hState _ _ sub2,
	hState _ _ sub1,
	buf' ++ hOutput _ _ sub2))
  end

This substituted handler must satisfy two properties: it must be a valid handler, and must still satisfy the specification \(Q\). These two properties form the main theorems of this post. Both proofs are admitted, but at least stand some chance of being true. Both theorems would be proven by converting a step of the joined handler into a step of either the user or slave handler, and using their validity.

Definition prop2' : h_prop substitute_stack substitute :=
  {| property :=
       property _ _ prop2
       : trace (hI _ substitute) (hO _ substitute) -> Prop ;
     property0 := property0 _ _ prop2 |}.

Hypothesis valid_h : stack |-- h HANDLER.
Hypothesis spec_h :  stack |-- h :::: prop.

Hypothesis valid_h2 : stack2' |-- h2 HANDLER.
Hypothesis spec_h2  : stack2' |-- h2 :::: prop2.

Theorem substitution_valid :
  substitute_stack |-- substitute HANDLER.
Admitted.

Theorem substitution_maintains :
  substitute_stack |-- substitute :::: prop2'.
Admitted.

Additional thoughts

It should be possible to prove weakening and reordering, but not idempotency (for example, one database is not two databases), so that \(\Gamma \vdash h : P\) forms a linear logic. Then the substitution theorems are the cut-elimination theorems for that logic.

There should be a predicate for "runnable" specs, potentially with some type-class magic to derive that judgement automatically, where runnable specs include networks (given a port number), logs (given file names), and timers (given firing frequency). This would also make very concrete the correctness criterion for the shim and the extraction procedure: given a supported context \(\Gamma\) and \(\Gamma \vdash h : P\), does the extraction of \(h\) actually only admit traces that satisfy \(P\)?

To handle failure, a rule could be added to the compositional semantics that clears a node's state and runs a special reboot handler. To give nodes some possibility for persistent state, a new supported device could be added to represent a file. Writing handlers would get harder in this case (callback hell), but that could be dealt with using all the tricks programmers have already invented for that.

There is a different form of substitution, where instead of sending a message to a slave handler running on the same device, a node sends a message to a different node. This works in principle, but I'm not sure this is common. For example, with the lock server, a client is running on each node, even though the master server is elsewhere. Likewise, you can describe a name server or KVS as having a local client to handle the protocol, and a database as having a local driver, and so on.

In this model, each system implicitly forms a VST. For example, the resend-and-dedup VST that turns a faulty network into an asynchronous one.

1
Some implementations, like the timeout or the network, would be provided by a shim or other external source, while others, like a database, would be implemented separately.

By on . Share it—it's CC-BY-SA licensed.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.