Pavel Panchekha


Share under CC-BY-SA.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

Selective Crossword Submissions

In my previous post on crossword science, I looked at whether Joe got better at crosswords by doing several hundred crosswords in one weekend. Today, I want to look not at getting better at crosswords, but at pretending to be better at crosswords.

The model

If you recall, my model of crossword performance contains four factors:

  • The difficulty of that day's crossword
  • The skill of the solver
  • Crosswords on Saturdays are harder
  • Beginners underperform their true skill

I was interested in determining whether any solver was pretending to be more skillful than they truly were. In particular, some avid crossworders suspected Sam, who has posted several stunning crossword times, of either lying about his times or only posting the good ones.11 For the record, I did not believe this likely. Now that I had a rigorous model of crossword performance, it was possible to test these claims statistically. In particular, I wanted to see whether anyone was submitting selectively: doing a crossword, and deciding whether or not to post it based on their time.

The way I thought about it, a selective submitter good times but not post their mediocre times. Now, for any user, observing their crossword time is easy, but observing the crossword difficulty is harder, especially if other users haven't posted their times yet. So, users who try to post good times won't be able to correct for the day difficulty, and will end up posting both times they overperformed, and times for easy crosswords. Thus, for a selective submitter, the days they post will consistently be easier than they days they could have but didn't post.

Searching for selective submitters

This suggested a plan to find selective submitters. First, I would create, for each user, a list of days when they submitted and a list of days when they could have but didn't submit. Then, I'd compute the average difficulty of both lists. Finally, I'd check whether the means were sufficiently far apart to be statistically significant.

For the first step, listing the submitted dates was easy, but listing dates when a user could have but didn't submit was trickier. Who knows what crosswords someone could have done! But I decided that we could use someone's first and last crosswords as a pretty good proxy for the range of dates where they were interested in doing the crossword; dates in between that they didn't do were potential crossword dates. I also threw out dates where they hadn't done the crossword for a week prior. If you take a week-long break, you've probably forgotten about the crossword or something. In code, the algorithm looked like this:

mi, ma = min(submitted_dates), max(submitted_dates)
out = []
since_good = 0
for d in sorted(all_dates):
    if mi < d < ma:
        if d in submitted_dates: since_good = 0
        else: since_good += 1
        if since_good <= 7:

Here I step through the set of all dates and use the since_good variable to track how many days it's been since the last submitted crossword.

With this set of potential dates in mind, I built two lists (play and dont) of submitted and unsubmitted dates. Now, it turns out that the difficulty of crosswords is, in my model, normally distributed. If the user is not a selective submitter each list is just a set of random samples from that normal distribution, so its mean is also normally distributed. If play has length \(P\) and dont has length \(D\), then \[ m_D - m_P \sim \mathcal{N}\left(0, \sigma \sqrt{\frac1P + \frac1D}\right) \] where \(\sigma\) is the standard deviation of the difficulty distribution and where \(m_P\) and \(m_D\) are the averages of play and dont.

So, it's enough to compute those averages, take their difference, divide by that fairly complicated square root term, and we get out a \(p\) value for whether a given player is a selective submitter.


When I run the numbers for every crossword player, searching for a \(p\) value of less than 1%, I actually find several players. Aha! Caught red-handed!

Actually, instead of publishing the list of names immediately to shame them, I interviewed several on how they did crosswords and their use of the crossword submission bot. It quickly became clear that the majority simply did not know that the bot lets you submit a fail time, indicating that you were unable to solve the crossword, so would simply get stuck and not have a time to submit. I let them know that the bot has that function. One also said that they wanted to save the crosswords for later, so they could go back and solve them, and didn't want to submit a time until then.

In short, far from cheaters, this exercise uncovered several sharp edges in the crossword bot that we could potentially work on in the future.

That said, if anyone has any ideas how to work selective submission into the model, so we don't overestimate these crossworders' skills, let me know.

For the record, I did not believe this likely.