Beating Wordle with Python

7 min readJan 12, 2022

Wordle is a simple game that has experienced a meteoric rise in growth over the past month. You have 6 attempts to solve a 5 letter word. If you get a letter in the correct position it will light up green. If it is in the word but in the incorrect position it will light up brown, otherwise it will stay grey.

It has been featured in the New York Times, the Guardian and my family’s WhatsApp Group.

I try and play optimally, picking words with common letters and no duplicates etc. but I occasionally still lose while other people who play a less optimal approach, win.

I decided to use Python to see if Wordle can be solved every time.

Approach

First we will select a secret word. For example, lets say the word is “stare”.

We are going to get our pool of 5 letters words from here.

We are going to score each word using scrabble scoring and give a small penalty for duplicates. Similar to my previous fantasy football post.

We are going to pick a word and store the result.

We want to know if each letter was correct and in position, correct and out of position or incorrect.

For each correct and in position letter, we want to filter out all words from our pool that do not have a letter in that position.

For each correct but out of position letter we want to remove all words from the pool that do not contain that letter.

For each incorrect letter, we want to remove all words that do not contain that letter.

We then repeat the process, picking words at random from the pool until we get the correct one or fail.

Part 1: Scoring the Pool

I’m going to add up the Scrabble score for each letter and add 2 more points if the word has any duplicate letters.

For example, the word “later” would receive a score of 5 as all 5 letters are worth 1 point in Scrabble but a word like “pzazz” would receive 3+10+1+10+10 and an additional 2 point penalty for having duplicates for a score of 36. P is worth 3 points, A is worth 1 and Z is worth 10.

To do this,

We make a Dataframe out of the word file.
Make a dictionary out of the letter values
Split each letter into its own column
Map each of the 5 letter columns to a value using the word value dictionary
Add up the value columns to a score.
Check for duplicates

Part 2: Picking a Word from the Pool

We get lowest score in the dataframe and remove all words from the pool that are higher than that score. We then take one of the lowest scoring words at random.

On our first attempt, there are 213 words with a score of 5 (the lowest possible score). We pick out “liers” at random.

Part 3: Checking a Word against the Hidden Word

If the word is correct, we end the game and start again with a new word.

We want to loop through the word and check if the letter is correct and the right position, correct and in the wrong position or incorrect.

We are going to return 3 things.

A list of dictionaries with letter and position keys. The position will either be a number or a question mark.
A list of invalid letters.
A list of invalid positions for letters that are in the word but not in the correct position.

These will be decided by:

If the letter and position are correct we add a dictionary with the position and letter.
If the letter is correct and position incorrect, we return a letter and the position as a question mark. We also add it to the invalid position dictionary.
If the letter is incorrect we add it to the list of invalid letters.

“Liers” is not the correct guess (reminder the word is stare). Our method loops through each letter and returns the following:

Known_positions: [{}, {}, {‘letter’: ‘e’, ‘position’: ‘?’}, {‘letter’: ‘r’, ‘position’: 3}, {‘letter’: ‘s’, ‘position’: ‘?’}]
Invalid_letters: [“l”, “i”]
Invalid_Positions: [{}, {}, {‘letter’: ‘e’, ‘position’: 2}, {}, {‘letter’: ‘s’, ‘position’: 4}]

Part 4 Filtering the DataFrame

We know that “L” and “I” are not in the word so we can filter them out using a regex and then use “contains” to remove all words that match the regex.

regex_string = "[" + "".join(invalid_letters) + "]"df = df.loc[~df.words.str.contains(regex_string, regex=True)]

We start with 5757 words in our pool, once we remove all words containing “L” and “I”, it falls to 3161.

We then loop through the valid letters dictionary. If the position is a question mark, we remove every word that does not contain that letter. We want to remove all words that have the letter “E” or “S”. This can easily be done with “contains” like before.

for items in letters:
        if items['position'] == '?':
            df =  df[df['words'].str.contains(items['letter'])]

Our potential words falls from 3161 to 1601 after this filter.

We know that there is an R in position 3 so we remove all words that do not have an R in position 3. This reduces the possible words from 1601 to just 43.

columns = {0: "first", 1: "second", 2: "third", 3: "fourth", 4: "fifth"}
for items in letters:
    column = columns[items['position']]
    df = df.loc[df[column] == items['letter']]

We know that there is an S and an E in the word but also that the E is not in position 2 and S is not in position 4. We remove all words with an E in position 2 and an S in position 4. This reduces the possible words down to just 13.

After 1 guess, we have eliminated <99% of possible words.

The remaining pool looks like this.

Part 5: Repeat

On our second attempt, 4 words have the minimum score store, stare, snore and snare. The sample picks out “stare” and the game is won.

I decided to run this algorithm for all 5757 words in our pool. It solved for 5237 words with the following distribution:

It solved most words in 4 guesses but almost 10% of words were not solved. I would like to declare victory and say that Wordle is not a 100% solvable game but unfortunately, the solution can be improved.

Part 6: Adding a heuristic

Looking through the logs I noticed that words like “water” were not being solved as it would guess “later” around attempt 3 and be left with “cater”, “dater”, “hater”, “mater”, “pater” and “water” and its only option would be to try each one and hope to get the answer.

If it were to try a word like “champ” it could eliminate “cater”, “hater”, “mater” and “pater” with 1 guess, leaving just “dater” and “water” to try on the next attempt.

So I decided to check when recommending a word, if there is more 3 or more letters picked and we have 4 or less guesses, we would use a different method to pick a word.

I decided to rescore the pool of words and give points if it is one of the remaining letters in a remaining column.

We pass in the chose letters, letter value dictionary, vacant positions and a column map.

We derive the remaining letters by looking at the vacant positions and taking every unique letter from the column.

We change the column map so that every remaining letter gets -20 instead of their original score. We map and add up their scores like we did previously and add a 40 point penalty for duplicate letters.

We finally take the best scoring word and return that as our chosen word. For “water” this is “champ”.

Using this method, our words unsolved falls from 500 to 150. Our distribution significantly changes as the majority of words are solved on the 6th attempt. I can now solve 97.5% of words on Wordle.

However, 97.5% is not 100%. If Wordle were to choose a word like “Kooks” it would be almost impossible to get under any rational approach as it has only 3 unique letters and the 5th least frequent letter in English. You would have to play Wordle like my younger sister and get lucky by picking “looks” and then guessing “kooks” instead of “books” or “cooks”.

You can find all the code on my Github here. Feel free to give this a clap and connect with me on LinkedIn.