Using deep learning to guess countries from photos in GeoGuessr

During the last lockdown in the UK, my wife and I played GeoGuessr. This game is more measured than the ones we usually play, but works well for our family with an 11-week-old who is getting more active every day.

GeoGuessr is geographic exploration game… You are thrown to a random point in Google Street View, after which your task is to indicate your location on the map. You can explore the surroundings, zoom in and follow the path of the car on local streets.


We were really interested in the Daily Challenge on GeoGuessr. We started visiting the site every day and trying to set a new record. In the Daily Challenge format, three minutes are allocated for each round, which we spent either madly clicking on the Australian bush (sometimes confusing it with South Africa), or discussing whether the Swedish language has a letter ø

Now I have accumulated a large amount of knowledge like “I will see – I will know”… I can identify Greenland at a glance. My lost knowledge of the flags of countries has returned, and also new knowledge has appeared about the flags of the US states, about those countries where they drive on the left and right lanes, where they use kilometers or miles. I know almost all the domain names of the countries (they can often be found on advertising billboards along the roads) – I will not forget for a long time .yu

Did you know that black and white road fences are common in Russia and Ukraine? Or that the blue EU stripe on the license plates can be made out despite the blurring of Google Street View? You can read more about this in this guide from 80 thousand wordsGeoguessr – the Top Tips, Tricks and Techniques

The downward-pointing red and white striped arrow indicates that you are in Japan, most likely on the island of Hokkaido or possibly on the island of Honshu near the mountains.


A bit of deep learning

I once read that machine learning can already do everything that a person can, but in less than one second. Face recognition, select text from an image, rotate so as not to crash into another car. This got me thinking, and thinking led me to an article titled Geolocation Estimation of Photos using a Hierarchical Model and Scene Classificationby Eric Müller-Budak, Kader Pusto-Irene and Ralph Evert. In this article, geolocalization is viewed as “a classification problem in which the Earth is subdivided into geographic cells.”

It predicts the GPS coordinates of the photos.


Even from photographs taken indoors! (The GeoGuessr Daily Challenge often shoves the player inside museums.)

The authors of the article recently released an implementation in PyTorch and indicated the weights for the trained model. base(M, f*) with internal ResNet50 architecture.

I assumed that the trained model would not fit very well with the parts of the photospheres that I could get from GeoGuessr. As training data, the authors used “a subset of the 100 million Yahoo Flickr Creative Commons (YFCC100M) photograph dataset”. It included “roughly five million Flickr geotagged images and vague photographs such as indoor shots, food and people whose location is difficult to predict.”

The curious thing was that in the Im2GPS dataset, people determined the location of the image with country-level accuracy (within 750 km) in 13.9% of the time, and Individual Scene Networks coped with this task 66.7% of the time!


So the question arose: who is better at GeoGuessr, my wife (amazing player) or the car?

Automating GeoGuessr with Selenium

To scrap screenshots from the current in-game location, I created a Selenium program that does the following four times:

  • Save a screenshot of the canvas
  • Taking a step forward
  • Rotate the view by about 90 degrees

The number of repetitions of these actions can be configured through NUMBER_OF_SCREENSHOTS in the code below.

Given a GeoGuessr map URL (e.g.
take a number of screenshots each one step further down the road and rotated ~90 degrees.
Usage: "python"
from selenium import webdriver
import time
import sys

geo_guessr_map = sys.argv[1]

driver = webdriver.Chrome()

# let JS etc. load

def screenshot_canvas():
    Take a screenshot of the streetview canvas.
    with open(f'canvas_{int(time.time())}.png', 'xb') as f:
        canvas = driver.find_element_by_tag_name('canvas')

def rotate_canvas():
    Drag and click the <main> elem a few times to rotate us ~90 degrees.
    main = driver.find_element_by_tag_name('main')
    for _ in range(0, 5):
        action = webdriver.common.action_chains.ActionChains(driver)
            .move_by_offset(118, 0) 

def move_to_next_point():
    Click one of the next point arrows, doesn't matter which one
    as long as it's the same one for a session of Selenium.
    next_point = driver.find_element_by_css_selector('[fill="black"]')
    action = webdriver.common.action_chains.ActionChains(driver)

for _ in range(0, NUMBER_OF_SCREENSHOTS):


The screenshots also contain the GeoGuessr interface, but I didn’t bother deleting it.

Approximate definition of geolocation

I went to the branch PyTorch branch, downloaded the trained model and installed the dependencies with conda… I liked the README of the repository. Section requirements was straightforward enough and on the new Ubuntu 20.04 I had no problems.

To clarify the relationship between man and machine, I chose the map in GeoGuessr World… After sending the URL to my Selenium program, I ran it for four screenshots taken in GeoGuessr.

Below are the abbreviated results of the machine.

python -m classification.inference --image_dir ../images/

                                lat        lng
canvas_1616446493 hierarchy     44.002556  -72.988518
canvas_1616446507 hierarchy     46.259434  -119.307884
canvas_1616446485 hierarchy     40.592514  -111.940224
canvas_1616446500 hierarchy     40.981506  -72.332581

I showed the same four screenshots to my wife. She assumed the point was in Texas. The location was actually in Pennsylvania. The machine made four different guesses for each of the four screenshots. All the guesses of the car were in the United States. Two close enough to each other and two farther away.


If we take the average location, then the car wins this round!

We played two more rounds, and the final score was 2-1 in favor of the car. The car got close enough to a street in Singapore, but was unable to identify a snow-covered street in Canada (Madeline named the city in a matter of seconds).

After writing this post, I learned about an amazing previous job comparing human versus machine performance on the GeoGuessr battlefield. The article PlaNet – Photo Geolocation with Convolutional Neural Networks Tobias Veyand, Ilya Kostikov, and James Philbin tried to locate the photograph with just a few pixels.

To find out how PlaNet compares to human intuition, we allowed it to compete with ten highly traveled people in Geoguessr (

In total, people and PlaNet played 50 rounds. PlaNet won 28 of 50 rounds with a median localization error of 1,131.7 km, while the median human error was 2,320.75 km.

Web demo

Authors of the article Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification created a pretty cute web tool. I tested it in one of the Selenium screenshots.

A graphical demo in which you can compete against the deep learning system described in the article is here:… We also created a multifunctional web tool that supports loading and analyzing custom images:


GeoGuessr learnability

There are many reasons why trying to beat GeoGuessr (by which we mean often demonstrating better results than humans) using machine learning may be easier than locating a human photograph.

Unlike the generalized definition of geolocation, in GeoGuessr we (almost always) try to figure out which road we are on. This means that more effort can be made to recognize the always present elements, such as road markings, car brands and models (both often give away country). Efforts can be made to navigate the roads in search of road signs from which the language of the country can be understood, and the text on the signs can be used to search the table.

There are other markers (some in the GeoGuessr community consider them a scam) that the learning framework can recognize.

If you look down in street view, you can see a part of the car that was filming the current photosphere. For example, in Kenya, the front of the car has a black pipe. The bulk of Vietnam has been taken off the motorcycle and the driver’s helmet can often be seen. Countries are often filmed with a single machine with a unique color or antenna.


Elsewhere in the sky, there is a spot where the stitched photosphere appears to be torn apart (mainly in Senegal, Montenegro, and Albania). In Africa, the Street View is sometimes followed by escort vehicles. There are different generations of cameras, with different resolutions, halo types, colors and blur at the bottom of the sphere. In the lower corner of the photosphere, there is a message about the authorship, usually “Google” and the indication of the year, but sometimes the name of the photographer is also present.

Using these tips, I wouldn’t be surprised that the machine will someday beat even the best GeoGuessr users in a timed competition. In fact, I believe that one research grant would be enough for us to play GeoGuessr significantly worse than machines.


Order a server and start working right away! Creature VDS any configuration within a minute. Epic 🙂

Similar Posts

Leave a Reply