Pavel Panchekha


Share under CC-BY-SA.

Cryptic Mastermind with Z3

On May 5–6th, I took part in Microsoft's Puzzlehunt. I love these events (as you can imagine, from past puzzles I've written), but this year's had a puzzle I also managed to apply some of my Z3 skills to. You see, in a puzzle hunt, you're allowed to use any tools you wish,1 [1 Except for people not on your team…] including programming and Z3. But it's rare that I find the opportunity…

How the Puzzle Worked

This year, one of the puzzles was a cryptic mastermind game. In the usual mastermind game, you need to discover a password—say, a 5-letter word—and are allowed some number of guesses. Every time you guess, you're told how many of the letters were: correct, and in the right place; correct, but in the place in the word; totally incorrect. For example, if the correct answer is HEART, the word HORSE has one correct letter in right place (the H), and two correct letters in the wrong place (the E2 [2 Which should be in second position, not fifth.] and the R3 [3 Which should be in fourth position, not third.]). In the nomenclature of the puzzle, this would be one black letter (the H), two white letters (the E and R), and to red ones (the O and S). In a normal mastermind game, you don't know which letters are which category, you just know the count. You have to use logic to figure it out.

Unlike a normal mastermind game, the Microsoft Puzzlehunt had a cryptic mastermind game. Instead of giving you the counts for each category, it gave five cryptic crossword clues. For example, the response for HORSE included clues like:

  • “Hardwood from Webb County, even (5)”, for ebony meaning black
  • “Poet Walt loses masculinity and is followed by head of empathetic shade (5)”, for white4 [4 Because Walt Whitman, minus masculinity "man" leaves "Whit", add on the first letter (head) of "empathetic shade" leaves "white".]

These clues were sometimes possible to solve, since they were always cluing black, white, or red, but they were still pretty hard, so for the majority we were not able to solve them. After a while of banging our head against the system, I came up with the idea of writing a solver, in Z3, for mastermind, which would be able to make use of partial information.

Encoding Mastermind in Z3

A mastermind solver contains a password, must count black, white, and red letters, and has constraints in that the password is a valid word and satisfies the already-seen clues. All of these could be implemented in Z3.

The password was represented by five "letter" variables, l0 through l4:

(declare-datatypes ()
  ((t A B C D E F G H I J K L M N O P Q R S T U V W X Y Z)))
(declare-const l0 t)
(declare-const l1 t)
(declare-const l2 t)
(declare-const l3 t)
(declare-const l4 t)
(assert (distinct l0 l1 l2 l3 l4))

The requirement that the letters be distinct was spelled out in the puzzle. Note that I defined a letter data type. I obviously could have used the numbers from 0 to 25, but this made it a lot easier to debug, and during a puzzle hunt that's crucial, since it doesn't have to be good code, it just has to work.

Next, I had three functions to count the number of black, white, and red letters. I represent the test word by letters w0 through w4:

(define-fun countb ((w0 t) (w1 t) (w2 t) (w3 t) (w4 t)) Int
  (+ (ite (= l0 w0) 1 0)
     (ite (= l1 w1) 1 0)
     (ite (= l2 w2) 1 0)
     (ite (= l3 w3) 1 0)
     (ite (= l4 w4) 1 0)))

Counting red letters is counting the number of wn variables that are distinct from all ln variables:

(define-fun countr ((w0 t) (w1 t) (w2 t) (w3 t) (w4 t)) Int
  (+ (ite (distinct w0 l0 l1 l2 l3 l4) 1 0)
     (ite (distinct w1 l0 l1 l2 l3 l4) 1 0)
     (ite (distinct w2 l0 l1 l2 l3 l4) 1 0)
     (ite (distinct w3 l0 l1 l2 l3 l4) 1 0)
     (ite (distinct w4 l0 l1 l2 l3 l4) 1 0)))

I've found that distinct constraints are fairly efficient for small problems, and are really easy to write and debug, so I love to use them for puzzles like this. They're not good for big problems, though. I had a bad experience recently trying to use them in Cassius, for example.

Since there are five letters total, the white letters can be computed by subtracting:

(define-fun countw ((w0 t) (w1 t) (w2 t) (w3 t) (w4 t)) Int
  (- 5 (countb w0 w1 w2 w3 w4) (countr w0 w1 w2 w3 w4)))

Defining B and R directly, and W in terms of them, it also important since it's tricky to define a W counter that doesn't also count B's, while there's obviously no overlap between B and R.

Next, I need to add the constraint that the password is a valid word. Here I'm using my Linux distro's word list. Obviously the "true" list of words differs a lot depending on who you ask, but these puzzles never use "controversial" words anyway:

WORDS = set([w.strip().upper()
             for w in open("/usr/share/dict/words")
             if len(w.strip()) == 5
             and all(l in "abcdefghijklmnopqrstuvwxyz"
                     for l in w.lower().strip())])

def print_wordlist(fin, fout):
    fin.write("(assert (or " + 
              " ".join(
                  "(and " + " ".join("(= l{} {})".format(i, l)
                                     for i, l in enumerate(word)) + ")"
                  for word in WORDS) + "))\n") 

This code produces a giant or expression that just enumerates every possible word. It's not pretty, but the whole problem is small enough that Z3 still only takes a few seconds per query, which is enough for me. The core of the query just equates every ln variable with the letter in the word.

Finally, I need to have some library of words I've tried and their responses. I decided to store responses as a five-letter string like B_WR_, where B, W, and R represent colors while underscores are clues that I couldn't understand. Again, I'm optimizing here for a representation that's easy to debug, not "good programming style". Also, using the string instead of manually counting B, W, and R clues means less room for mistakes and easier checking after the fact. I was terrified I'd go through a whole process here and end up with unsat.

I stored the library in the RESULTS variable, like this:

    "WORST": "_R___",
    "HOUSE": "RR___",
    "WORSE": "__RW_",
    "ZONES": "RR___",
    "HORSE": "B_WR_",
    "EBONY": "RRWRR",

Each of these could be turned into a constraint on the true password:

def word_constraints(word, result):
    word = word.upper()
    result = result.upper()
    numb = result.count("B")
    numw = result.count("W")
    numr = result.count("R")
    num_ = result.count("_")
    assert numb + numw + numr + num_ == len(result)

    body = []
    body.append("(assert (<= {} (countb {}) {}))"
                .format(numb, " ".join(word), numb + num_))
    body.append("(assert (<= {} (countw {}) {}))"
                .format(numw, " ".join(word), numw + num_))
    body.append("(assert (<= {} (countr {}) {}))"
                .format(numr, " ".join(word), numr + num_))
    return "\n".join(body)

Note the assert here. I was being very defensive, again because I was worried about going down a blind path, but actual error handling wasn't a concern, so there isn't even an explanatory message on the error. Each result turns into a constraint that counts the true blacks, whites, and red, and asserts a range for them: as few as the number actually seen, but as many as the if all unknown results were that color.

With all of these pieces in place, the interaction with Z3 was supposed to be simple: assert all known constraints on the password; get a model; and output the password it represents. I would send that password to the mastermind game, interpret the cryptic clues as best I could, input those to the program, which would call Z3 again in a loop:

def talk(fin, fout): # fin and fout talk to Z3

    print_wordlist(fin, fout)

    for word, result in RESULTS.items():
        fin.write(word_constraints(word, result) + "\n")

    while True:
        word = get_model(fin, fout)
        if word:
            result = input(word + " > ").strip()
            RESULTS[word] = result
            fin.write("(assert (not (and " +
                      " ".join("(= l{} {})".format(i, l)
                               for i, l in enumerate(word)) +
                      " )))\n")
            fin.write(word_constraints(word, result) + "\n")

The assert line requires that, once I've tried a word, it isn't suggested again. This wasn't a problem at first, but turned out to be essential because we got a few clues that neither I nor Talia could figure out (like for SHORE). Those don't add any constraints, so we'd get the same word suggested over and over again in a loop.

Most of the interaction with Z3 itself happens in that get_model call:

def get_model(fin, fout):

    is_sat = fout.readline().strip()
    assert is_sat == "sat", is_sat

    out = {}
    for i in range(5):
        i = fout.readline()[15]
        l = fout.readline()[4]
        out[i] = l
    word = out["0"] + out["1"] + out["2"] + out["3"] + out["4"]

    if word not in WORDS:
        fin.write("(assert (not (and " +
                  " ".join("(= l{} {})".format(i, l)
                           for i, l in enumerate(word)) +
        print("# Not", word)
        return False
        return word

This code is terrible. The lines from out = {} to word = ... count exact numbers of lines and pull exact characters out of Z3's formatting of the model. Z3 really communicates in S-expressions, and it would be great if this actually parsed them. For whatever reason, I wrote the code in Python, instead of Racket, and so didn't have an easy S-expression parser handy, so I had to do this ugly thing. Note that there's some error checking code in there, in the if word not in WORDS code. It was useful when I was first writing the code, and had some bugs, but it's not possible to trigger it with the code in this post.


Before I started coding, we'd already tried 22 words, from ARGON to HYENA, and I had to do 22 more rounds with Z3 before it spit out the correct answer, HEART. Some of it's guesses, like HERTZ and HERZL, lead to totally uninterpretable clues. Here's the full list of Z3's 22 rounds, with the word and the responses I recorded:

Word Result   Word Result   Word Result

It's clear that through this entire process Z3 was using the sort of complex logical thinking humans can't easily do. For one, it only saw two clues with B's: HORSE, which I'd tried before Z3, and PRATE, which came quite late. It doesn't seem to have figured out which letter in HORSE was the B; Z3 was comfortable just working with the disjunction it had. That's hard for people. Also, most of the words only yielded R clues; I guess we found the W and B cryptic clues a lot harder for some reason. Negative clues like this are really hard for people, since we're not thinking about the whole set of possible words and paring it down. Z3 is.

It was not just luck that Z3 found HEART so quickly. The full search space, after all the words above plus the 22 we started with, is only 10 words. So it turns out that cryptic mastermind is very solvable by Z3, even when relatively few clues were actually decoded. Future puzzle writers could make the password a little longer: there are roughly twice as many seven-letter words as five-letter words, for example. On the other hand, that would also make the puzzle harder for humans, which might not be right.

Feel free to reuse the full script. Happy puzzling.



Except for people not on your team…


Which should be in second position, not fifth.


Which should be in fourth position, not third.


Because Walt Whitman, minus masculinity "man" leaves "Whit", add on the first letter (head) of "empathetic shade" leaves "white".