Sitemap

Analyzing 100+ Games of NYT Connections

5 min readJul 7, 2025
Press enter or click to view image in full size

As part of our morning routine, my wife and I enjoy playing different brain games together to help us wake up. We started playing NYT Connections pretty soon after it became available in the NYT Games app, and I decided to start tracking our results in February of 2024. For each game we played, I tracked the amount of time we took to solve each puzzle, along with our guesses. Here are some interesting findings from our Connections history!

Top Level Statistics

The main statistic I was interested in when I started tracking our results was Number of Wins vs. Losses. I very quickly realized that was not going to be a very exciting metric:

Press enter or click to view image in full size

Connections was a big part of our weekday mornings as we got ready for work / school, but sometimes we would skip it on the weekends:

Press enter or click to view image in full size

We recorded fewer games in May as our morning routine fell out of sync due to travel and a busy end of Monica’s semester at grad school. In June we moved to Seattle for the summer, and Connections again became a big part of our mornings, before slowly decreasing when we returned to Philly.

Press enter or click to view image in full size

I was very curious to dig into our Time stats. Unsurprisingly, the average Time to solve each puzzle was positively correlated with Number of Guesses:

Press enter or click to view image in full size

Guessing Patterns

Our main goal when playing Connections each morning was just to finish each game as fast as possible. We did not try to guess the categories in any particular color order, so our most common guess patterns generally aligned with color difficulty (Yellow, Green, Blue, Purple). An initial ordering of our guessing patterns by count didn’t reveal anything too surprising:

  1. Yellow-Green-Purple-Blue (13 times)
  2. Yellow-Blue-Green-Purple (13 times)
  3. Green-Yellow-Blue-Purple (11 times)
  4. Yellow-Green-Blue-Purple (9 times)
  5. Blue-Green-Yellow-Purple (5 times)

To analyze our guessing patterns in more detail, I generated a Sankey diagram mapping each game from start to finish.

Press enter or click to view image in full size

This is my favorite chart. Although it looks crazy, its a neat visualization of some of our most fascinating patterns:

  • Although Blue and Purple felt like similarly difficult final colors in games when we picked them last (46 vs 74), its clear that Purple was much more difficult to discern at the start of the game (6 vs 21).
  • We definitely fell for the “obvious category” trap the Connections authors like to set up at the start of the game.
  • You can see us floundering quite a bit when our third guess is “incorrect” — half of those led to an additional “incorrect” guess.

Another data point I tracked was whether or not we successfully guessed the final category. In other words, once we got down to just four words left, did we correctly deduce the category those words belong to or not? Here, the contrast between Blue and Purple is also more apparent — we were decently more likely to have figured out the Blue category by the end of the game compared to Purple.

Press enter or click to view image in full size

Difficulty Rating

Another statistic I was interested in is whether or not the puzzles we struggled with were actually more difficult than the average NYT Connections puzzle. Luckily, I found a Reddit post where a user had already manually copied the difficulty score based on the “Connections Companion” tester difficulty for several months worth of games in 2024, so I only had to manually fill in the last 20 data points for my dataset. Big shout out to u/FadeVanity for compiling this data.

Matching the dates of the difficulty with the dates of our Connection games did seem to confirm our worst performances were correlated with higher difficulty games:

Press enter or click to view image in full size
Press enter or click to view image in full size

However, there clearly were some outliers. I tried to normalize our Time data and generated a chart with our Biggest Over-performances and Worst Under-performances relative to the Confirmed Difficulty:

Press enter or click to view image in full size

Now we know exactly which days we crushed it (marked in green) and which days we totally bombed (marked in red). We were pretty consistently good in June and September. In August we had some major flops — 8/21 and 8/22 must have been rough.

Our two failed puzzles were 6/30/24 (difficulty 3.7, normalized difference ~0.2) and 10/07/24 (difficulty 3.2, normalized difference ~0), which are above the average difficulty for the period of 2.9 and below the average normalized difference of 0.34. These were definitely not our best games, but also not totally unexpected based on their difficulty.

I went back and replayed some of our biggest flubs, and I can confirm that “Fried Appetizer, Informally” is still a garbage category. Not once, not ever, has anyone referred to mozzarella sticks as just “sticks.”

Big shout out to my one true Connection, my beautiful wife Monica, for all the happy mornings together.

All of the chart generation was done using ChatGPT using the GPT-4o model, and QA’d by me reviewing the code and checking a few data points and kinda just going “yeah that seems good enough.”

--

--

Responses (1)