How to Solve the Popular Wordle Puzzle in 2022 in Python

By the start of the course Full stack development in Python we tell how to solve Wordle. Worlde is a new puzzle game that has captured the attention of many people around the world. For details, we invite under cat.


Heard about the game wordle? Its simplicity is deceiving. It is necessary to guess the English words of five letters. If the word is not guessed, hints are given: the color of the cell with the letter turns green if its place in the word is guessed; yellow, if it is in the word, but in another place; and gray if it is not in the word. It seems simple, but still difficult! Let’s write a problem solver for Wordle. It takes set abstractions, Python list generators, and a little bit of luck! To make it easier, further, the background color will be considered the color of the letter.

Task

Every day a new word is generated in Wordle. We only have six attempts, and the site uses cookies to track progress – so choose carefully! But it looks like there are hints here:

We write a Wordle problem solver in Python.

  1. The word consists of five English letters.

  2. There are no punctuation marks, numbers or other symbols.

  3. After each attempt, hints are given:

  1. The background behind the letter turns green if the symbol and its place in the word is guessed.

  2. The background turns yellow if the symbol there is in a word, but in a different place.

  3. The background behind the letter is grey, if the character is in a word No

  1. The number of allowed words is limited by the Wordle dictionary.

It would be too easy to use it, it is better to take a free Linux dictionary here: /usr/share/dict/american-english. There is one word for each line.

Loading and generating words

Let’s take a dictionary first. You can choose yours. The code describes the rules of the game:

import string

DICT = "/usr/share/dict/american-english"

ALLOWABLE_CHARACTERS = set(string.ascii_letters)
ALLOWED_ATTEMPTS = 6
WORD_LENGTH = 5

Only six attempts, the length of the word is five (letters), we use all available characters of the alphabet.

We transform the characters allowed in set () to apply the set functionality in terms of membership checks. More on this later. We generate a set of words that correspond to the rules of the game:

from pathlib import Path

WORDS = {
  word.lower()
  for word in Path(DICT).read_text().splitlines()
  if len(word) == WORD_LENGTH and set(word) < ALLOWABLE_CHARACTERS
}

To create a set of valid words, here use set abstraction, as well as the Path class to read data directly from a file. Recommend familiarize with Path: it has great functionality.

Filter dictionary words so that only words with the desired length remain and characters belonging to subset ALLOWABLE_CHARACTERS. From the dictionary, only those words are selected that can be written using the set of valid characters.

Frequency analysis of the English alphabet

A feature of the English language is the uneven distribution of letters in words. For example, the letter E is used more often than X. Therefore, we generate words with the most frequent letters – this way there are more chances to find a match in Wordle with the characters of the word. A winning strategy is to create a system that returns the most frequent letters in the English language. With a dictionary it will be easier!

from collections import Counter
from itertools import chain

LETTER_COUNTER = Counter(chain.from_iterable(WORDS))

The Counter class is a dictionary with item counting. When values ​​are passed into it, they are tracked as keys. This saves the number of occurrences, that is, the values ​​of these keys. This frequency of letters must be used in the task.

To do this, we will use the chain function from the itertools module. It has a hidden from_iterable method that takes a single iterable object and evaluates it as a long chain of such objects. An example will help you understand:

>>> list(chain.from_iterable(["inspired", "python"]))
['i', 'n', 's', 'p', 'i', 'r', 'e', 'd', 'p', 'y', 't', 'h', 'o', 'n']

Strings are also iterable objects, and WORDS is a set of such strings, so we break the set (or list, etc.) into their constituent characters. This is what strings are useful for: we pass them through set to get unique characters in a word:

>>> set("hello")
{'e', 'h', 'l', 'o'}
  • Sets are modeled in their mathematical counterparts with the same name, contain only unique values ​​- no repetitions – and disordered

Therefore, the order of the set of characters is different than in the string. Sets have a lot of functionality, like checking a subset (whether one set is completely contained in another), getting elements that overlap two sets (intersection), joining two sets, etc.

So the letters are counted:

>>> LETTER_COUNTER
Counter({'h': 828,
         'o': 1888,
         'n': 1484,
         'e': 3106,
         's': 2954,
         'v': 338,
         # ... etc ...
        })

But this only gives the absolute number of characters. It’s better to break it down into a percentage of the total. To do this, we use the total method in the Counter class, which gives the total number of occurrences of letters.

Let’s translate this number into a frequency table:

LETTER_FREQUENCY = {
    character: value / LETTER_COUNTER.total()
    for character, value in LETTER_COUNTER.items()
}

Python 3.10 introduced the Counter.total() method, so if you’re working with older Python, you can replace it with sum(LETTER_COUNTER.values()).

Here we apply dictionary generatorto enumerate each key and value of the new, counting LETTER_COUNTER dictionary, and divide each value by the total:

>>> LETTER_FREQUENCY
{'h': 0.02804403048264183,
 'o': 0.06394580863674852,
 'n': 0.050262489415749366,
 'e': 0.10519898391193903,
 's': 0.10005080440304827,
 # ... etc ...
 }

The result was an ideal system for counting the frequency of letters, using dictionary subset… Moreover, not the entire dictionary was taken, but only parts with words that are valid in Wordle. It is unlikely that this will greatly affect the ratings, but now we have a lot of words that we use.

Every word must be weighed to come up with the most likely candidates. We take a frequency table and create a word count function to estimate the frequency of letters in a word:

def calculate_word_commonality(word):
    score = 0.0
    for char in word:
        score += LETTER_FREQUENCY[char]
    return score / (WORD_LENGTH - len(set(word)) + 1)

Again we use the string as an iterable, iterating over each character in the word. We get the frequency of each word, add it, divide the total number by the length of the word minus the number of unique characters (and plus one to avoid dividing by zero).

This word counting and weighting function is simple: the rarer characters of a word are given more weight. Ideally, you want as many unique and frequent symbols as possible to maximize the chance of hitting green or yellow in Wordle.

A quick test confirms that words with rare and repetitive characters have less weight than words with frequent and even rarer ones:

>>> calculate_word_commonality("fuzzy")
0.04604572396274344

>>> calculate_word_commonality("arose")
0.42692633361558

Now we need a way to sort and display these words:

import operator

def sort_by_word_commonality(words):
    sort_by = operator.itemgetter(1)
    return sorted(
        [(word, calculate_word_commonality(word)) for word in words],
        key=sort_by,
        reverse=True,
    )

def display_word_table(word_commonalities):
    for (word, freq) in word_commonalities:
        print(f"{word:<10} | {freq:<5.2}")

Using sort_by_word_commonality, we generate a sorted (in descending order) list of tuples with a word and its score in each of them. The sort key is score.

To get the first element, it’s easier to use operator.itemgetter instead of a lambda expression.

We will also add a quick display function to translate the words with ratings into a simple tabular form. Let’s move on to the problem solver.

Writing a problem solver for Wordle

For a simple console application, we use the input() and print() functions:

def input_word():
    while True:
        word = input("Input the word you entered> ")
        if len(word) == WORD_LENGTH and word.lower() in WORDS:
            break
    return word.lower()


def input_response():
    print("Type the color-coded reply from Wordle:")
    print("  G for Green")
    print("  Y for Yellow")
    print("  ? for Gray")
    while True:
        response = input("Response from Wordle> ")
        if len(response) == WORD_LENGTH and set(response) <= {"G", "Y", "?"}:
            break
        else:
            print(f"Error - invalid answer {response}")
    return response

Its functionality is simple. We ask the user for the word WORD_LENGTH given in Wordle and write the answer from Wordle. There are three possible answers (green, yellow and gray), so we encode it with a simple string of three characters: G, Y and ?.

I also added error handling in case the user made a mistake in their input. The loop continues until the correct sequence is given. I convert the input to a set again, and then check if that set of user data is a subset of the valid answers.

Filtering green, yellow and gray letters with word vector

According to the rules, a letter turns green if it and its place in the word is guessed, yellow if it is in the word, but in a different place, and gray if it is not in the word. There is another interpretation of the rules: as long as Wordle does not specify which letters are green, yellow or gray, everything is possible:

word_vector = [set(string.ascii_lowercase) for _ in range(WORD_LENGTH)]

Here we create a list of sets, and its size is equal to the length of the word, that is, 5. Each element is the set of all lowercase English characters. After looping through each set, we remove the characters as their word is excluded:

  • That is, if green is the second letter in the word, we change the set so that only this letter appears in the place of the second letter.

  • That is, all letters can be in this place, Besides this one. Removing a letter from the set at this place guarantees: we will not be able to choose words in which the value of this letter is – [именно] this symbol.

Now we need a function to determine if a word matches a vector of words. Here is a simple and convenient one:

def match_word_vector(word, word_vector):
    assert len(word) == len(word_vector)
    for letter, v_letter in zip(word, word_vector):
        if letter not in v_letter:
            return False
    return True

This approach uses zip to pair-wise match each character in a word and a vector of words (if any).

If the letter is not in the set of the vector of words at this place, we exit without a found match. Otherwise, we continue and, if we exit the loop normally, True will return, indicating the found match.

Word matching

With the rules implemented, let’s write a search function that filters the list of words based on the responses received from Wordle:

def match(word_vector, possible_words):
    return [word for word in possible_words if match_word_vector(word, word_vector)]

In the matcher, everything discussed above is combined in a single list generator, where the check is performed. Each word is tested against word_vector using match_word_vector.

Response iteration

Now we need a small user interface to repeatedly request the desired answer:

def solve():
    possible_words = WORDS.copy()
    word_vector = [set(string.ascii_lowercase) for _ in range(WORD_LENGTH)]
    for attempt in range(1, ALLOWED_ATTEMPTS + 1):
        print(f"Attempt {attempt} with {len(possible_words)} possible words")
        display_word_table(sort_by_word_commonality(possible_words)[:15])
        word = input_word()
        response = input_response()
        for idx, letter in enumerate(response):
            if letter == "G":
                word_vector[idx] = {word[idx]}
            elif letter == "Y":
                try:
                    word_vector[idx].remove(word[idx])
                except KeyError:
                    pass
            elif letter == "?":
                for vector in word_vector:
                    try:
                        vector.remove(word[idx])
                    except KeyError:
                        pass
        possible_words = match(word_vector, possible_words)

Most of the above settings are done in the solver function. Then we loop to ALLOWED_ATTEMPTS + 1 and show each attempt with a possible number of remaining words. We then call display_word_table to print a pretty table with the top 15 matches. Then we request the word and the response received from Wordle.

We list the answer, remembering the place of each letter. The code is simple: we match each of the three response characters with the appropriate container (green with word_vector, etc.) and apply the rules.

Finally, we redefine possible_words with the new list of matches from match and repeat the loop displaying a smaller subset.

Try:

The responses correspond to the requests passed to the problem solver. We run it by calling solve() (part of the output is omitted for brevity):

>>> Attempt 1 with 5905 possible words
arose      | 0.43
raise      | 0.42

   ... etc ...

Input the word you entered> arose
Type the color-coded reply from Wordle:
  G for Green
  Y for Yellow
  ? for Gray
Response from Wordle> ?Y??Y
Attempt 2 with 829 possible words
liter      | 0.34
liner      | 0.34

   ... etc ...

Input the word you entered> liter
Response from Wordle> ???YY
Attempt 3 with 108 possible words
nerdy      | 0.29
nehru      | 0.28

   ... etc ...

Input the word you entered> nerdy
Response from Wordle> ?YY?G
Attempt 4 with 25 possible words
query      | 0.24
chewy      | 0.21

   ... etc ...

Input the word you entered> query
Response from Wordle> GGGGG
Attempt 5 with 1 possible words
query      | 0.24

Conclusion

  • Set abstractions, list generators, dictionary generators, and so on are powerful Python tools that combine loop traversal and filtering. But to overdo them in for loops or if statements is to make the code harder to understand. Limit yourself to a few for and if.

  • Sets are one of the main advantages of Python.

  • Ability to apply in a timely manner set membership makes the code more stable, mathematically correct and concise. Here it is used very effectively – do not neglect sets!

  • They are best at finding matching (or mismatching) characters. Though there’s more to learn here: think about how to rewrite the matcher and word converter into vector form using regular expressions.

  • You can do a lot with plain Python if you know how to use built-in modules. itertools is especially good for lazy or iterative evaluation of values.

You can continue learning Python in our courses:

Find out the details here

Professions and courses

Data Science and Machine Learning

Python, web development

Mobile development

Java and C#

From basics to depth

As well as

Similar Posts

Leave a Reply