Bannalia: trivial notes on themes diverse: Start Wordle with TARES

Monday, January 17, 2022

Start Wordle with TARES

There have been some discussions on what the best first guess is for the game Wordle, but none, to the best of my knowledge, has used the following approach. After each guess, the game answers back with a matching result like these:

■■■■■ (all letters wrong),

■■■■■ (two letters right, one mispositioned),

■■■■■ (all letters right).

There are 3⁵=243 possible answers. From an information-theoretic point of view, the word we are trying to guess is a random variable (selected from a predefined dictionary), and the information we are obtaining by submitting our query is measured by the entropy formula

H(guess) = − ∑ p_i log₂ p_i bits,

where p_i is the probability that the game returns the i-th answer (i = 1, ... , 243) for our particular guess. So, the best first guess is the one for which we get the most information, that is, the associated entropy is maximum. Intuitively speaking, we are going for the guess that yields the most balanced partition of the dictionary words as grouped by their matching result: entropy is maximum when all p_iare equal (this is impossible for our problem, but gives an upper bound on the attainable entropy of log₂(243) = 7.93 bits).

Let's compute then the best guesses. Wordle uses a dictionary of 2,315 entries which is unfortunately not disclosed; in its place we will resort to Stanford GraphBase list. I wrote a trivial C++17 program that goes through each of the 5,757 words of Stanford's list and computes its associated entropy as a first guess (see it running online). The resulting top 10 best words, along with their entropies are:

TARES    6.20918
RATES    6.11622
TALES    6.09823
TEARS    6.05801
NARES    6.01579
TIRES    6.01493
REALS    6.00117
DARES    5.99343
LORES    5.99031
TRIES    5.98875

3 comments :

Nathan LJanuary 17, 2022 at 2:58 PM
This comment has been removed by the author.
ReplyDelete
Replies
randybruderJanuary 17, 2022 at 8:18 PM
For what it's worth, the Wordle JavaScript file includes the dictionary which is 10,657 entries. It's the array named Ta. (Although spoiler warning: the JavaScript file contains another array La that includes the answers, including a bunch of future answers.)
ReplyDelete
Replies
David KleinMarch 23, 2022 at 7:12 AM
Instead of maximizing entropy, wouldn't you want to minimize the expected number of target words remaining? I.e. minimize \sum p_i n_i = N \sum p_i^2?

I actually calculated the above expectation on a depth two search (guess one word then for each possible result guess a second word that minimizes the above expectation over all targets that match both words). I happened to have used the same Stanford GraphBase list you did.

It turns out that the two ply search minimizing the expected number of words remaining to guess, is "tares" :-)
ReplyDelete
Replies

Add comment