### New Role Model

Tea Leoni as Dr. Elizabeth Adams McCord in the TV show Madam Secretary. At this point she is a professor of history at the University of Virginia…

Formal methods in software engineering and teaching computer science

Tea Leoni as Dr. Elizabeth Adams McCord in the TV show Madam Secretary. At this point she is a professor of history at the University of Virginia…

In my course *Formal Methods in Software Engineering*, I spend a few weeks reviewing formal logic. I like to give an assignment where I make the students play the role of human validity-checker/model-finder. In these questions I specify a first order vocabulary; a context, i.e., a set of formulas assumed to hold; and a query, which is another formula. The goal is to determine whether the query is valid, i.e., will it evaluate to *true* under all models in which all of the context formulas evaluate to *true*. In case it is not valid, an explicit counter-model must be provided, i.e., a model in which the context holds but the query doesn’t.

Here is one of the problems: the vocabulary consists of a single constant , and a predicate and function , both of arity 2.

- Context:
- Query: .

I’m not sure what I was thinking when I made up that problem, but I probably thought I had a proof that it was valid — perhaps I meant to add another formula to the context. In any case, when I looked at it again the day the assignment was due, I couldn’t figure it out. On the one hand, I didn’t think it was valid, on the other, all my attempts to come up with a model failed. (You might try to answer this question yourself before reading further.)

So I tried asking CVC4, an automated theorem prover which we had covered in class. Here is the input file:

T : TYPE;

p : (T,T) -> BOOLEAN;

f : (T,T) -> T;

c : T;

```
```

`ASSERT FORALL (x,y:T): p(f(x,y),f(y,x));`

ASSERT FORALL (x:T): EXISTS (y:T): p(f(x,y),c);

QUERY EXISTS (x:T): p(f(x,c),c);

And CVC4’s terse response:

unknown (INCOMPLETE)

This really isn’t that surprising: the validity question for FOL is undecidable in general, and in my experience CVC4 (and most SMT-style theorem provers) usually have trouble with all but the simplest queries involving quantifiers.

In class, the students were in an uproar—apparently they all spent hours trying to solve this problem, had tried CVC4 unsuccessfully, just as I had — and none of them could come up with a proof or counter-model. They were really curious about the answer. I had to admit I didn’t know! — and went on with the lecture — about Alloy.

Somewhere near the end of the lecture it struck me we might use Alloy to answer that question. With 5 minutes to go, I whipped up the following Alloy model:

abstract sig Bool {} one sig T, F extends Bool {} sig U { f : U -> U, -- function from UxU to U P : U -> Bool -- function from UxU to Bool ("predicate") } one sig c in U {} fact { all x,y : U | one f[x,y] all x,y : U | one P[x,y] all x,y : U | P[f[x,y],f[y,x]]=T all x : U | some y : U | P[f[x,y],c]=T } assert query { some x : U | P[f[x,c],c]=T } check query for 3

I’m constantly surprised by the flexibility of the Alloy language. It has been used to represent such a wide variety of domains. Once you get the hang of it, it is quite easy to use. Like CVC4, the checking/solving part is fully automatic. Unlike CVC4, Alloy can’t really prove things — for example, it cannot prove that the query above is valid. Instead, the user specifies a bounded scope, which in this case means placing an upper bound on the size of the domain U. With bounded scope, the validity problem becomes decidable: assuming you don’t run out of memory (or patience), Alloy can tell whether or not there exists a model within the scope satisfying the specified properties, and in the case where such a model does exist, it will show you one (or all) of them. The magic works by converting the problem to a boolean satisfiability (SAT) problem, which is passed off to a standard SAT solver; if the SAT solver reports the formula is satisfiable, Alloy translates the solution back to an instance of your Alloy model.

Here is a really interesting recent application of Alloy: Pamela Zave of AT&T Labs used Alloy to investigate the Chord protocol, a widely-used distributed hash-table algorithm. Published papers about chord specify the algorithm carefully and “prove” (in the traditional informal sense) it satisfies several properties. Using Alloy, Dr. Zave discovered that none of those properties actually holds. She also showed how to fix Chord to make it correct. If you happen to be in Delaware on Friday, Nov. 6, 2015 come to the SIG-SYS seminar (12:15 PM, Smith Hall, Room 101), where Dr. Zave will be talking about this work.

Back to the FOL assignment: Alloy found a counter-model with U of size 3 (but not less than 3). It took less then two tenths of a second. Here it is: , , and

We checked by hand that the context holds and the query doesn’t in this model.

The students now have an admiration for Alloy bordering on awe….I wish I could say I planned it that way ðŸ˜‰