# Another solution to the Wordle game in Python

Mickey Petersenwritten in perfect Python, uses a statistical analysis of the letters of the English alphabet and quite successfully copes with the task.

2) https://habr.com/ru/post/647391/ — translation of the solver from Tom Lockwood, which solves the English game in 99.4% of cases. The author explored the insides of the game and tried to make the most of the information received about possible hidden words and possible input words, but in the end it all comes down to statistical analysis. Perhaps in the future I will use the information extracted from the game to improve my algorithm.

What is the game about?

The computer guesses a word of 5 letters. The user enters his word of 5 letters, the computer highlights gray letters that are not included in the hidden word, yellow – enters, but is not in its place, green – the letter enters and is in its place. There are 6 attempts to guess Where to play:

https://www.powerlanguage.co.uk/wordle/ — English language, one word per day for all players in the world

https://wordle.belousov.one/ — Russian-language adaptation, similarly one word per day

https://www.wordleunlimited.com/ — English version with no limit on the number of hidden words

I reasoned differently. What do we get by putting the word into the game? We receive information, and the amount of information may vary depending on the input word. So, for example, by entering the word “fuzzy” we are unlikely to greatly narrow the search circle. But what is the strongest move? We need a word that can “cut” the entire list of possible words into as small sublists as possible. We will call the mask the color combination that the game returns after entering the word. Theoretically, everything is possible various masks. Thus, each new word ideally divides the set of all possible hidden words into 243 sublists.

In fact, due to the peculiarities of natural languages, not everything is so rosy: cutting into sublists is uneven and it turns out that they are actually not 243, but much less and differ for different words. The idea of ​​my solver is just to choose a word at each step that divides the dictionary into as many piles as possible.

My game model is this:

1. The dictionary of words allowed for input and the dictionary of possible hidden words are the same and match the dictionary from `/usr/share/dict/american-english `(as in the article on the first link).

2. The partitioning obtained at each step is close to uniform and the sizes of each heap of words are approximately the same (the assumption is obviously naive: after all, for the “five green letters” mask there corresponds a heap of only one word).

## We write the code

Warning: bad code style

My level of Python proficiency is noticeably lower than the author of the first solver. Code examples may contain sub-optimal solutions, bad codestyle and make your eyes bleed 🙂

``````dictpath="/usr/share/dict/american-english"
# Пользуюсь вариантом первого решателя

validwords=set()

letters="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZйцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮ"

for w in allwords:
if("'" in w):continue
if(len(w)!=5):continue
for i in range(6):
if(i<5 and w[i] not in letters):break

try:
validwords.remove('clint') # Игра такое слово не принимает
except:pass

validwords=list(validwords)``````

In the bottom line, we have 5904 English words. Let’s define some useful functions:

Checking that the word can be guessed according to the data entered into the game:

``````def isThisSecretAvailable(testword,mask,secret):
'''
mask: G,Y,N -- green, yellow, none
Return True if secret can be secret word with this testword and mask
'''
for i in range(len(mask)):
if(mask[i]=='N' and testword[i] not in secret):continue
if(mask[i]=='Y' and testword[i] in secret and testword[i]!=secret[i]):continue
return False
return True

#isThisSecretAvailable("price","GGGNG","prize")
#returns True``````

Get a mask for a pair of entered (testword) and hidden (secret) words:

``````def getMask(testword,secret):
'''
Returns mask of NYG symbols for typed testword and secretword
'''
for i in range(len(testword)):

#returns "YNYNY"``````

Get a sublist of possible hidden words from the available data from the list (wordlist), this function will be used to narrow the search circle:

``````def getAvailableWordsByMask(testword,mask,wordlist):
'''
Return list of available words by typed testword and mask
'''
validsecrets=[]
for w in wordlist:
validsecrets.append(w)
return validsecrets``````

And now let’s find out which words are closest to the mathematical limit of splitting into 243 sublists:

``````testwordmasks=dict() # Сделаем словарь: слово -> множество возможных масок
for i in validwords:
for s in validwords:

masksvariances=[] # Сделаем лист с информацией о количестве разных масок
for i in validwords:

maxvariancewords1=[]
for i in range(len(validwords)):
print(validwords[i])
maxvariancewords1.append(validwords[i]) # И выпишем лучшие слова``````

This code worked for me for 37 seconds and gave the following result: `maxvariancewords1=['tares', 'tears']` , `maxmasksvariances=188`.

Now I wrap this logic of finding the best options in a separate function and write a minimalistic user interface:

``````def getBestSteps(wordlist,allwords=None):
'''
Get best step for find word in wordlist by allwords dictionary
HardMode on if allwords=None or allwords=wordlist
'''
if(allwords is None):
allwords=wordlist
for i in allwords:
for s in wordlist:
for i in allwords:
maxvariancewords=[]
maxvariancewords2=[]
maxvariancewords3=[]
for i in range(len(allwords)):
print(allwords[i])
maxvariancewords.append(allwords[i])
# На случай, если в maxvariancewords будет всего одно слово и его не будет в словаре игры
maxvariancewords2.append(allwords[i])
maxvariancewords3.append(allwords[i])
# Среди лучших вариантов я бы поставил на первое место те, в которых буквы не повторяются
maxvariancewords.sort(key=lambda x:-len(set(x)))
maxvariancewords2.sort(key=lambda x:-len(set(x)))
maxvariancewords3.sort(key=lambda x:-len(set(x)))
return maxvariancewords+maxvariancewords2+maxvariancewords3

# Main algorithm:
def mainloop():
print("Enter one of next words:",maxvariancewords1,"("+str(maxmasksvariances)+" different masks)")
newwordlist=getAvailableWordsByMask(input("What word did you type: ").lower(),input("What mask did you get: ").upper(),validwords)
print("Found",len(newwordlist),"available words")
beststeps=getBestSteps(newwordlist,validwords)
if(len(beststeps)>7):beststeps=beststeps[:7]
print("Please, type one of next words:",beststeps)
while(len(newwordlist)>1):
newwordlist=getAvailableWordsByMask(input("What word did you type: ").lower(),input("What mask did you get: ").upper(),newwordlist)
print("Found",len(newwordlist),"available words")
beststeps=getBestSteps(newwordlist,validwords)
if(len(beststeps)>7):beststeps=beststeps[:7]
print("Please, type one of next words:",beststeps)

All code can be viewed and downloaded here.

Now let’s try and play:

I played a few more games on the site https://www.wordleunlimited.com/ and the program guessed the word “gamma” on the 5th try, “heard” on the 4th try, “pores” on the 3rd try, “amber” on the 4th try, “peace” on the 4th try 3rd. On the first iteration after the word “tears”, the program narrows the search from 5904 words to less than a hundred (from 17 to 98 words).

What else can be done:

• Take a Russian-language dictionary (for example, from here) and test the algorithm on the Russian-language Wordle (spoiler: the program took the word on January 25 on the third attempt, and the machine considers the word “spanking” to be the best first move)

• Load into the program the dictionaries described in the article by Tom Lockwood, which I referred to at the beginning

• Investigate the maximum number of attempts the algorithm takes to guess the words? What percentage of wins can he guarantee?

• Investigate whether it would improve the situation by changing tactics from choosing words that cut the dictionary into the maximum number of sublists, to choosing words that cut into approximately equal sublists, but in fewer? Or some intermediate option? Which option is statistically better?

• Investigate the quality of the game in hard mode, when you have to choose words that fall under all hints (spoiler: the program has successfully solved dozens of examples)