2) https://habr.com/ru/post/647391/ — translation of the solver from Tom Lockwood, which solves the English game in 99.4% of cases. The author explored the insides of the game and tried to make the most of the information received about possible hidden words and possible input words, but in the end it all comes down to statistical analysis. Perhaps in the future I will use the information extracted from the game to improve my algorithm.
What is the game about?
The computer guesses a word of 5 letters. The user enters his word of 5 letters, the computer highlights gray letters that are not included in the hidden word, yellow – enters, but is not in its place, green – the letter enters and is in its place. There are 6 attempts to guess Where to play:
https://www.powerlanguage.co.uk/wordle/ — English language, one word per day for all players in the world
https://wordle.belousov.one/ — Russian-language adaptation, similarly one word per day
https://www.wordleunlimited.com/ — English version with no limit on the number of hidden words
I reasoned differently. What do we get by putting the word into the game? We receive information, and the amount of information may vary depending on the input word. So, for example, by entering the word “fuzzy” we are unlikely to greatly narrow the search circle. But what is the strongest move? We need a word that can “cut” the entire list of possible words into as small sublists as possible. We will call the mask the color combination that the game returns after entering the word. Theoretically, everything is possible various masks. Thus, each new word ideally divides the set of all possible hidden words into 243 sublists.
In fact, due to the peculiarities of natural languages, not everything is so rosy: cutting into sublists is uneven and it turns out that they are actually not 243, but much less and differ for different words. The idea of my solver is just to choose a word at each step that divides the dictionary into as many piles as possible.
My game model is this:
The dictionary of words allowed for input and the dictionary of possible hidden words are the same and match the dictionary from
/usr/share/dict/american-english(as in the article on the first link).
The partitioning obtained at each step is close to uniform and the sizes of each heap of words are approximately the same (the assumption is obviously naive: after all, for the “five green letters” mask there corresponds a heap of only one word).
We write the code
Warning: bad code style
My level of Python proficiency is noticeably lower than the author of the first solver. Code examples may contain sub-optimal solutions, bad codestyle and make your eyes bleed 🙂
dictpath="/usr/share/dict/american-english" # Пользуюсь вариантом первого решателя allwords=open(dictpath,"r").read().split('n') validwords=set() letters="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZйцукенгшщзхъфывапролджэячсмитьбюЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮ" for w in allwords: if("'" in w):continue if(len(w)!=5):continue for i in range(6): if(i<5 and w[i] not in letters):break elif(i==5):validwords.add(w.lower()) try: validwords.remove('clint') # Игра такое слово не принимает except:pass validwords=list(validwords)
In the bottom line, we have 5904 English words. Let’s define some useful functions:
Checking that the word can be guessed according to the data entered into the game:
def isThisSecretAvailable(testword,mask,secret): ''' mask: G,Y,N -- green, yellow, none Return True if secret can be secret word with this testword and mask ''' for i in range(len(mask)): if(mask[i]=='N' and testword[i] not in secret):continue if(mask[i]=='G' and testword[i]==secret[i]):continue if(mask[i]=='Y' and testword[i] in secret and testword[i]!=secret[i]):continue return False return True #isThisSecretAvailable("price","GGGNG","prize") #returns True
Get a mask for a pair of entered (testword) and hidden (secret) words:
def getMask(testword,secret): ''' Returns mask of NYG symbols for typed testword and secretword ''' mask="" for i in range(len(testword)): if(testword[i]==secret[i]):mask+="G" elif(testword[i] in secret):mask+="Y" else:mask+="N" return mask #getMask("argon","slang") #returns "YNYNY"
Get a sublist of possible hidden words from the available data from the list (wordlist), this function will be used to narrow the search circle:
def getAvailableWordsByMask(testword,mask,wordlist): ''' Return list of available words by typed testword and mask ''' validsecrets= for w in wordlist: if(isThisSecretAvailable(testword,mask,w)): validsecrets.append(w) return validsecrets
And now let’s find out which words are closest to the mathematical limit of splitting into 243 sublists:
testwordmasks=dict() # Сделаем словарь: слово -> множество возможных масок for i in validwords: testwordmasks[i]=set() for s in validwords: testwordmasks[i].add(getMask(i,s)) masksvariances= # Сделаем лист с информацией о количестве разных масок for i in validwords: masksvariances.append(len(testwordmasks[i])) maxmasksvariances=max(masksvariances) maxvariancewords1= for i in range(len(validwords)): if(masksvariances[i]==maxmasksvariances): print(validwords[i]) maxvariancewords1.append(validwords[i]) # И выпишем лучшие слова
This code worked for me for 37 seconds and gave the following result:
maxvariancewords1=['tares', 'tears'] ,
Now I wrap this logic of finding the best options in a separate function and write a minimalistic user interface:
def getBestSteps(wordlist,allwords=None): ''' Get best step for find word in wordlist by allwords dictionary HardMode on if allwords=None or allwords=wordlist ''' if(allwords is None): allwords=wordlist testwordmasks=dict() for i in allwords: testwordmasks[i]=set() for s in wordlist: testwordmasks[i].add(getMask(i,s)) masksvariances= for i in allwords: masksvariances.append(len(testwordmasks[i])) maxmasksvariances=max(masksvariances) print("Different masks:",maxmasksvariances) maxvariancewords= maxvariancewords2= maxvariancewords3= for i in range(len(allwords)): if(masksvariances[i]==maxmasksvariances): print(allwords[i]) maxvariancewords.append(allwords[i]) if(maxmasksvariances==1):break elif(masksvariances[i]==maxmasksvariances-1): # На случай, если в maxvariancewords будет всего одно слово и его не будет в словаре игры maxvariancewords2.append(allwords[i]) elif(masksvariances[i]==maxmasksvariances-2): maxvariancewords3.append(allwords[i]) # Среди лучших вариантов я бы поставил на первое место те, в которых буквы не повторяются maxvariancewords.sort(key=lambda x:-len(set(x))) maxvariancewords2.sort(key=lambda x:-len(set(x))) maxvariancewords3.sort(key=lambda x:-len(set(x))) return maxvariancewords+maxvariancewords2+maxvariancewords3 # Main algorithm: def mainloop(): print("Enter one of next words:",maxvariancewords1,"("+str(maxmasksvariances)+" different masks)") newwordlist=getAvailableWordsByMask(input("What word did you type: ").lower(),input("What mask did you get: ").upper(),validwords) print("Found",len(newwordlist),"available words") beststeps=getBestSteps(newwordlist,validwords) if(len(beststeps)>7):beststeps=beststeps[:7] print("Please, type one of next words:",beststeps) while(len(newwordlist)>1): newwordlist=getAvailableWordsByMask(input("What word did you type: ").lower(),input("What mask did you get: ").upper(),newwordlist) print("Found",len(newwordlist),"available words") beststeps=getBestSteps(newwordlist,validwords) if(len(beststeps)>7):beststeps=beststeps[:7] print("Please, type one of next words:",beststeps) print("Your word is",newwordlist)
All code can be viewed and downloaded here.
Now let’s try and play:
I played a few more games on the site https://www.wordleunlimited.com/ and the program guessed the word “gamma” on the 5th try, “heard” on the 4th try, “pores” on the 3rd try, “amber” on the 4th try, “peace” on the 4th try 3rd. On the first iteration after the word “tears”, the program narrows the search from 5904 words to less than a hundred (from 17 to 98 words).
What else can be done:
Take a Russian-language dictionary (for example, from here) and test the algorithm on the Russian-language Wordle (spoiler: the program took the word on January 25 on the third attempt, and the machine considers the word “spanking” to be the best first move)
Load into the program the dictionaries described in the article by Tom Lockwood, which I referred to at the beginning
Investigate the maximum number of attempts the algorithm takes to guess the words? What percentage of wins can he guarantee?
Investigate whether it would improve the situation by changing tactics from choosing words that cut the dictionary into the maximum number of sublists, to choosing words that cut into approximately equal sublists, but in fewer? Or some intermediate option? Which option is statistically better?
Investigate the quality of the game in hard mode, when you have to choose words that fall under all hints (spoiler: the program has successfully solved dozens of examples)