Experiments in Wordle Solving
Like most of the internet, I’ve become a little obsessed with Wordle as of late. My first step in wordle obsession was to write my own wordle variant — Nordle (Wordle^nth) which is like dordle or quordle but, in my humble opinion, better. Once I was done with that, I got interested in wordle-solving algorithms.
Why Wordle Hard Mode is Dumb
This story begins when I learned about “hard-mode,” the wordle setting that locks you into guessing only words that match the current pattern you’ve found. I was shocked to learn this mode existed and horrified to learn it was called “hard mode,” which I think implies it’s better. I have lost exactly one wordle game so far, and like most folks, it was a result of my going down what we call “rhyming hell,” in which what looks like great luck in 3 or 4 green squares turns out to be a cursing in disguise when you have more words that fit the pattern than guesses remaining.
Consider, for example, the following guess:
⬛🟩🟩🟩🟩 sight
At this point, you have way more than 5 words left: eight/fight/light/might/night/right/tight, not to mention the more obscure bight/hight/wight/fight.
I don’t want to live in a world where the “advanced” way to play a board like this is to just throw the words at the game and hope you get lucky — obviously the superior play is to realize you only have one letter left to guess and a game that gives you information about up to 5 letters per turn.
A guess like “tabor” would let you eliminate “tight”, “bight” and “right” in one fell swoop, or you could go with “limn” and eliminate “light”, “might” and “night,” but clearly what you should not do with your precious guesses is waste your time on one of the words that might actually be correct and be guaranteed to learn nothing from 4 of your 5 letters.
Building a Hard-Mode Computer Solver
My first step was to build a “hard-mode” computer solver and try to prove that you couldn’t do all that well with hard mode, even knowing all the words. I proceeded to code up a solver that does the following:
- Look at all possible matches and calculate the frequency of letters in each position.
- Look at all matching guesses and calculate the odds of it getting a “green” square. Use this to compute a score for the word.
- While we’re at it, calculate the odds of getting a “yellow” square and factor that in as well.
- Pick the highest scoring word and guess it!
- If we didn’t get it, go back to step 1 and repeat until we have the word.
I fed this algorithm my full dictionary of “allowable” words for my nordle game and then ran it against my smaller dictionary of words as answers in my game. The results suggest that you will usually win if you play hard mode, but occasionally you will do really badly: here’s a summary of games the computer solved this way.
Here are the words that brought the hard-mode computer solver to its knees:
- 7 guesses : jolly,tasty,yummy,wiped,otter,lusty,payer,worse,spoon,hatch,shave,mixer,mouse,daddy,tally,wiper,stamp,merry,aging,penny,dried,upper,loner,boxer,wound,lower,waxed,oasis
- 8 guesses: brass,molly,taunt,mower,wowed,coded,joker,liner,piper,piped,waded,lilly,woods
- 9 guesses: dated,tight,layer,water
- 10 guesses: eater,class
To be clear, 97+% of games were solved by my hard-mode solver. But the games that it loses look pretty boring, like this:
⬛🟨🟨⬛🟨 crane
🟩🟩⬛🟩🟩 laser
🟩🟩⬛🟩🟩 lader
🟩🟩⬛🟩🟩 lager
🟩🟩⬛🟩🟩 laker
🟩🟩⬛🟩🟩 lamer
🟩🟩⬛🟩🟩 later
🟩🟩⬛🟩🟩 laver
🟩🟩⬛🟩🟩 laxer
🟩🟩🟩🟩🟩 layer
Building a Smarter Solver: focusing on Information
When I finished my hard-mode solver, it was less bad than I had expected. 97% of words could be solved by the computer using hard-mode, and that’s a computer with a ridiculously large word-list. A human narrowing to words that would actually be used in the puzzle could presumably do much better.
Nonetheless, I wanted to prove that I could do better solving wordles my way.
My new solver took a new approach to making guesss, one that went like this:
- Make a good first guess*
- Make a list of words that could now fit your guess.
- Now look at all possible words as guesses, even ones you know are wrong, and focus on which squares give you the most information. Squares that can tell you something are worth something. Squares that tell you nothing don’t matter. Score each word and make the guess that provides the most information. (The scoring itself gets a bit complicated — we preferentially want to guess letters we think will be present, but not if we know they’re present, so the way I score guesses is that if we think there’s more than a 50% chance of a letter being in a position, guessing that letter becomes less useful — guessing a 70% likely letter scores the same as guessing a 30% likely letter.)
- Once we are down to one or two words, just guess the word!
My computer solver aims never to “guess” until it has a 50% chance of being right, in which case guessing one of the two words provides the same amount of information as any other guess.
Once I got the good first guess (see details in next section), this algorithm was pretty damned good, solving 99.95% of games in 6 guesses, with just one, one, exception!
The new solver is fun to watch — look at how it solves “joked,” systematically eliminating letters (I coded my solver to show how many words remain at each step; once there are few enough it lists them so you can better understand what it’s doing)
⬛⬛⬛⬛🟨 irate => 567 words remain
🟨🟩⬛🟩⬛ doles => 34 words remain
⬛⬛🟨⬛⬛ chomp => 13 words remain
⬛⬛⬛🟨⬛ bawdy => 4 remain: foxed,joked,oozed,zoned
⬛⬛⬛🟨⬛ zinky => 1 joked
🟩🟩🟩🟩🟩 joked
By the second to last guess, the computer had narrowed to four possible words. Notice how the yellow “k” in “zinky” told the solver the answer was “joked,” where a green “z” would have given away “zoned,” a yellow “z” would have given it “oozed,” and a grey z and k would have told it “foxed” was the answer — pretty smooth, computer, pretty smooth.
Here are a few more examples so you can appreciate the elegance of the information-driven computer solver:
🟨⬛⬛⬛⬛ irate => 415
🟨🟨⬛⬛⬛ linos => 15
🟩⬛⬛🟨⬛ cupid => 5 chili,chill,click,cliff,climb
⬛⬛⬛🟨⬛ shelf => 2 click,climb
🟩🟩🟩⬛⬛ click => 1 climb
🟩🟩🟩🟩🟩 climb 1
At first I thought “shelf” wasn’t such a good guess, but you can see the yellow “l” (as opposed to a green “l”) actually helps the computer narrow it to “click” and “climb” vs. “chili” and “chill” which would have triggered a green “l” (and “cliff” would have had a yellow “f” as well).
Here’s another example:
⬛⬛⬛⬛⬛ irate => 632
⬛🟩⬛⬛⬛ loups => 54
⬛⬛⬛⬛🟩 candy => 14
⬛⬛⬛🟩🟨 gumbo => 3 bobby,booby,hobby
🟨⬛🟨⬛🟨 broth => 1 hobby
🟩🟩🟩🟩🟩 hobby 1
Once again, the computer uses letter position to its advantage: the “o” in “broth” lets the computer distinguish between “bobby” and “booby,” while also getting the “h” in there to tell it whether the word was “hobby” — in all three cases, the computer would know the word by guess #6!
Sidenote: The Best Starting Word
I only got to my 99.95% success rate once I settled on the ideal start word. And friends, that took some doing.
My first approach was to look at every word and compute the odds of every letter being “green” or “yellow.” When I fed the computer my full dictionary of allowable words, I usually landed on cares as the best starting word. The final “s” betrays a difference between the full list of 5-letter words and the list likely to be actual wordle words, since many 5-letter words ending in “s” are plurals or verb forms not likely to be used as answers in the game. When I run only my allowable words through the algorithm instead, I get a word like “crane” instead of “cares.” However, starting with “crane” I was still ending up with a handful of 7-guess solves each time I tried my algorithm, so I thought I’d work to do better.
It occurred to me that the information-rich way to think about first words would be to ask not just where are the common letters, but more specifically, which words help you narrow down the word list most quickly. Given that goal, I came up with the following algorithm for picking a starting word.
- Pick a large, random set of wordle words to use as a test set (I picked 200 words at random — I could have run all the words instead, but this actually takes a fair amount of computer time, and I was impatient)
- For each possible starting word, play one turn of wordle against each test word. Then, filter the list based on the squares you get back to see how many words you have left to filter from.
- Compute the average number of words left for all the trials, and pick the word that does the best job narrowing the list down.
When I ran the algorithm above, I got “tares” as the best starting word if I used a full dictionary and “oater” as the best starting word if I limited my words to the ones likely to actually be a wordle answer.
While these did very well, it occurred to me that in trying to beat wordle, I wasn’t interested in the average performance of these words but in the worst-case performance. In other words, I wanted the word that gave me the smallest number of remaining words when it did the worst at narrowing the words. When I tweaked my algorithm to select the least bad worst-case performance rather than the best average performance, I got “aloes” when I used all words and “irate” when I took into account wordle-y words.
Using “irate” as the starting word got my list of 7-guess words (words that the computer would lose wordle to) down to a single stubborn word — “tight.” Every other word on my list could be solved in 6 guesses starting with “irate.”
Note: if I use “aloes” as my start, it leaves me just two other words that take 7 guesses to solve: “rider” and “mixer”
Tight: the Thorn in My Side
So what makes “tight” such a rough word? What’s kind of fascinating is that after the second guess, my computer had narrowed the word to a mere 9 options:
Target : tight solved in 7
🟨⬛⬛🟨⬛ irate => 125
⬛🟩🟩🟩🟩 sight => 9 bight,dight,fight,hight,light,might,night,tight,wight
⬛⬛🟨⬛⬛ blind => 5 fight,hight,might,tight,wight
⬛⬛🟨🟨⬛ motif => 3 hight,tight,wight
⬛🟨🟨🟨⬛ white => 2 hight,tight
⬛🟩🟩🟩🟩 hight => 1 tight
🟩🟩🟩🟩🟩 tight 1
Now a normal person could probably eliminate three of those 9 options — bight, dight and hight are not words anyone I know uses.
That said, you’d think the computer could manage to guess 9 letters in three words, but for 2 of those words (hight and tight) it has to handle duplicated letters (meaning it needs to guess the “h” or “t” in the first position or guess a word with two “h”s or two “t”s. I don’t think my code was clever enough to know this, and it shows — those are the two words the computer was left with by guess 6 when it had to guess randomly, leading to the 7-guess loss.
Not surprisingly, if I eliminate the obscure “dight” and “hight” from my wordlist, I end up with 100% of words solved in 6 guesses! Of course, changing the dictionary changes the calculations, and the computer comes up with a totally new approach:
🟨⬛⬛🟨⬛ irate => 123
⬛🟨⬛⬛⬛ phons => 12
⬛🟩🟩🟩🟩 wight => 5 bight,fight,light,might,tight
⬛⬛🟨⬛⬛ blimp => 2 fight,tight
⬛🟩🟩🟩🟩 fight => 1 tight
🟩🟩🟩🟩🟩 tight 1
Two Other Thorns
When I play with “aloes,” which seems more fair to me since I only arrived at “irate” based on my list of allowable words, which seems like information my solver shouldn’t have, there are two other words beyond “tight” that I can’t solve in 6.
The first word is rider which isn’t too surprising since it involves is a very common -i-er pattern with a doubled letter.
⬛⬛⬛🟩⬛ aloes => 280
⬛🟩🟨🟩🟨 tired => 11 bider,cider,dicer,diker,dimer,diner,diver,eider,hider,rider,wider
🟨🟨🟨⬛⬛ dreck => 5 bider,eider,hider,rider,wider
⬛🟨🟨⬛⬛ bergh => 3 eider,rider,wider
⬛🟨🟨🟨🟨 weird => 2 eider,rider
⬛🟩🟩🟩🟩 eider => 1 rider
🟩🟩🟩🟩🟩 rider 1
In this case, I can see a guess that would have been better than what the computer landed on — if I’d guessed “ewers” instead of “weird,” it would have eliminated both “eider” and “wider” and left the computer with “rider” as the answer in 6. Clearly my algorithm could still use some tweaking!
The next word is mixer, which isn’t so rough at first glance, but the pattern the computer landed on required it to eliminate a bunch of words with doubled consonants (piper/mimer/fifer).
⬛⬛⬛🟩⬛ aloes => 280
⬛🟩🟨🟩⬛ tired => 25
⬛⬛⬛⬛⬛ pavan => 11 biker,fiber,fifer,fixer,giber,hiker,jiber,mimer,mixer,ricer,rifer
⬛🟩⬛🟨🟨 fibre => 4 hiker,mimer,mixer,ricer
🟩⬛⬛⬛⬛ macho => 2 mimer,mixer
🟩🟩⬛🟩🟩 mimer => 1 mixer
🟩🟩🟩🟩🟩 mixer 1
fibre isn’t such a bad guess since it eliminates 3 “f” words and 3 “b” words in one fell swoop. “macho” successfully knocks out “hiker” and “ricer,” but the computer couldn’t find a way to eliminate three options with one guess, with only the “x” or the doubled “m” distinguishing mimer and mixer.
Notice how much better “mixer” goes with “irate” as the first guess, though, truth be told, it feels a little like dumb luck that it went better this time, since at guess 3 it has 13 options vs. the 11 before.
🟨🟨⬛⬛🟨 irate => 122
⬛🟩⬛🟩⬛ fides => 30
⬛🟨⬛🟨⬛ levin => 13 biker,giber,hiker,hirer,jiber,mimer,mixer,piker,piper,ricer,riper,wiper,wirer
⬛⬛⬛🟨⬛ whomp => 2 mimer,mixer
🟩🟩⬛🟩🟩 mimer => 1 mixer
🟩🟩🟩🟩🟩 mixer 1