Pavel Panchekha


Share under CC-BY-SA.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

You Start Off Bad at Crosswords

People want to get better at crosswords. There's something about recording a number every day that makes you invested in it, you know?

In my last post on crossword science, I talked about a competitive NYT Minicrossword club I'm a part of, and how I used Stan to learn a statistical model of different peoples' skill at the crossword. However, that model viewed skill as a single, unchangeable attribute. In this post, I want to fix that.

Investigating improvement

The model described in the last post has a single skill term per user. That skill applies to all of the submissions that user makes, which means that if a user gets better over time (or worse), their skill will just represent their average skill over all of their submissions. This means the skill estimate will lag user skill and will consistently over-estimate times.

I wanted to correct for this, and to do so, I needed some model of how people improve over time. So first: what did I mean by time? I decided that the simplest measure of "time doing crosswords" is just how many crosswords you have already submitted. So, I added a variable named nth that, for each user, just counted up from 1 every time they submitted a crossword.11 Crosswords come with a submission timestamp, so we use that to handle people who go back and do crosswords they missed out of order. nth and date are distinct, because people start doing crosswords on different days, sometimes miss days, and sometimes go back and fill in days that they missed.

data {
    int<lower=1,upper=Ds> nth[Ss];

Now, I wanted to get a general sense of how nth affects crossword times, without imposing any model. So I added a vector of effects, one for each possible nth value:

parameters {
    vector[Ds] nth_effect;

model {
    secs ~ lognormal(... + nth_effect[nth], sigma);

Of course, Stan is going to have a lot of difficulty estimating this nth_effect. For one, non too many people have done a lot of crosswords; only five or six people have done several hundred. Also, nth_effect and difficulty_effect alias, in that submissions on later days mostly also have larger nth values. But still, Stan does produce some estimate, and if you smooth those estimates, you get the following chart (nth on the horizontal, nth_effect on the vertical):


In this figure, focus on the middle (yellow) line. The other lines are 2.5th, 25th, 75th, and 97.5th percentiles. The thick yellow line is the smoothed nth_effect; if you zoom in, you can see little yellow specs, and those are the individual estimates. It is pretty clear that there is some sort of smooth improvement for the first, say, hundred crosswords, and after that point improvement is pretty slow.22 I don't really know why the yellow line varies around -0.25, instead of around 0; it's probably due to the aliasing.

The natural interpretation of this data seems to be that there is a significant “beginner's handicap”: people get rapidly better at crosswords, to the tune of exp(.33) - 1 = 40% faster, over the first two or three months of crosswords. After that point, improvement is harder won; it's not even clear that things continue going down after that point.

I decided to add this beginner handicap to the model, so that it could be estimated simultaneously with the other parameters. Estimating simultaneously would mean that beginner handicaps wouldn't taint skill estimates, which would otherwise handicap players until they'd done many crosswords. In Stan:

parameters {
    real beginner_gain;
    real<lower=0> beginner_decay;

transformed parameters {
    vector[Ds] nth_effect;
    for (j in 1:Ds) nth_effect[j] = beginner_gain * exp(-j / beginner_decay);

Here I am using an exponential decay model for the beginner gain. On day 1, you have a handicap of approximately beginner_gain log-seconds, and that handicap becomes e times smaller every beginner_decay days.


When I estimate this model, I find that beginners are, on average, 68 worse than they'll eventually be, and that handicap decays with a half-life of 49 days. That seems to capture all of the nth effect, however:


This plot shows the prediction error versus nth when the beginner effect is subtracted out. The red line is a smoothed version of the blue point cloud. As you can see, there's no a clear trend. So this seems to fully capture the effect of improvement over time.

So, overall, you don't get better at crosswords over time, except for a quick shedding of the beginner's handicap over the first few months.

In my next post, I hope to explore a curious incident where Joe did several hundred crosswords over a three days in order to try to get better.

Addendum: I've explored many other modifications to the model, including different learning effects and also bigger modifications like trying to find latent factors behind each crossword's difficulty, and ultimately the model described here seems to be best.



Crosswords come with a submission timestamp, so we use that to handle people who go back and do crosswords they missed out of order.


I don't really know why the yellow line varies around -0.25, instead of around 0; it's probably due to the aliasing.