History and development of CAPTCHA

We started with a text CAPTCHA and came up with a simple checkbox to check, improving the system after each failure.

You go to a website to buy plane tickets. Before you click the “Submit” button, you need to check the box with the question: “Are you a robot?”

At first glance this looks like irony. Why do I need to confirm that I am a human, and even in front of a computer?

And even if I check this box, how does that prove that I'm human? After all, a robot can also tick a box, right?

This is similar to a jury asking a murderer whether he committed the crime. “Of course, I didn’t kill anyone,” the accused would answer.

So what is the point of such a question? Why do CAPTCHAs exist at all? And how do they check if the user is a real person based on simple queries?

In this article, we'll take a closer look at why CAPTCHAs are needed, how they have evolved over time, their different versions, and much more.

What is CAPTCHA?

CAPTCHA stands for “Completely Automated Public Turing Test for Distinguishing Between Computers and Humans.”

Quite a complicated acronym, isn't it? We'll simplify this now.

What is the Turing Test?
Legendary British mathematician and computer scientist Alan Turing argued with his colleagues and critics about whether a machine (digital computer) could ever achieve a level of intelligence comparable to a human.

To prove his point, Turing proposed a game. In this game, an “interrogator” (interrogator) asked questions to a person and a computer via text chat. If the interrogator could not tell the difference between their answers and the computer successfully impersonated a human, then it passed the test. Turing called it “the imitation game.”

Who would have thought that decades later the same test principle would be used to create CAPTCHA, but now to distinguish people from machines.

In 2000, a 22-year-old boy named Louis von Ahn, along with his professor Manuel Bloom, developed CAPTCHA to prevent automated programs from attacking networks and websites.

Why do we need CAPTCHAs?
You've probably heard about bots. Bots are programs that can perform specific tasks based on a script given to them. They often imitate human behavior and can complete tasks much faster.

There are useful bots, such as search engines that crawl web pages to index content, or chatbots that simulate human conversation.

However, there are also malicious bots that can interfere with users by spreading spam, hijacking accounts, or even bringing down large websites through DDoS attacks.

Here are some of the malicious actions that such bots can perform:

  • Credential Stuffing

  • Content Scraping

  • DoS or DDoS attacks

  • Collecting email addresses

  • Spam content

  • Hacking passwords using brute force methods

If left unchecked, these bots can cause many problems, such as:

  • Undermining the credibility of online surveys.

  • Hacking online accounts using password brute force attacks.

  • Ticket scalping: mass buying of tickets for subsequent resale.

In one such case, large supermarket Target suffered a data breach in 2013 that affected 70 million people. At the time, Target's supplier portal did not have a CAPTCHA. The disaster occurred as a result of a phishing email aimed at their customer base.

That's why we need CAPTCHA – to prevent system manipulation, actions that could affect millions of users on the Internet and lead to major fraud.

How does CAPTCHA work?

Most CAPTCHAs rely on visual tests, taking advantage of the fact that automated bots do not have the same degree of understanding of visual data as humans.

It all started with the need to enter strange, distorted text to enter the site or leave a comment.

So it wasn't exactly fully automated since it required manual text entry, but let's still call it “fully automated.” Why not?

Types of CAPTCHA

CAPTCHAs are divided into three main categories:

  1. Text CAPTCHAs

  2. CAPTCHA Images

  3. Audio CAPTCHA

Let's look at each of them.

Text CAPTCHAs

This is the oldest form of CAPTCHA and uses famous words or phrases, random garbled texts, combinations of numbers and letters, etc.

These characters are presented in an unusual way, making them difficult for automated programs to understand.

Text CAPTCHAs may include garbled characters, rotation, uneven scaling, and other effects. Some CAPTCHAs may also use symbol overlays with graphic elements such as color, background noise, lines, etc.

Image-based CAPTCHA

Many of us have encountered CAPTCHAs that require us to tag images of specific objects, such as traffic lights, cars, or other objects.

These CAPTCHAs are easier for people, but much more difficult for bots, as they require not only image recognition, but also its semantic interpretation.

Audio CAPTCHA

Text and visual CAPTCHAs are not suitable for visually impaired users, so audio CAPTCHAs were developed.

Audio CAPTCHAs are used in combination with text or image CAPTCHAs and provide an audio recording of a series of letters or numbers. These recordings usually contain background noise, making them difficult for bots to recognize.

The emergence of reCAPTCHA

When millions of people took these tests every day, everything worked as it should. But as we know, innovation has no boundaries, and Louis von Ann noticed another opportunity.

What if, instead of wasting time recognizing random garbled texts, you used old, unreadable book fragments?

In an interview with the magazine The Walrus Louis said he created a system that “wasted millions of hours of an invaluable resource – the human brain – ten seconds at a time.”

And it's true! Recognizing 200 million words a day translated into 500,000 hours of effort.

So he came up with the idea of ​​using real texts from old books that could not be recognized using optical character recognition (OCR) technologies. At that time, OCR could not correctly read about 20% of scanned words.

Louis's new idea was to put human effort into good use: users would unknowingly help the OCR system decipher these complex words and add them to the database.

This new version of CAPTCHA is called reCAPTCHA. The first book digitized using this method was an archive New York Timeswhich began publishing back in 1851 and currently includes 13 million articles.

How does reCAPTCHA work?

It is based on the principle of crowdsourcing. The book is first scanned digitally by the reCAPTCHA program administrator. The program selects two words: one that has already been read and recognized by the OCR system, and another that OCR could not recognize.

The user must guess both words in the reCAPTCHA field. If the user enters the first (recognized) word correctly, the program assumes that the second word entered is also correct and uses it for digitization.

The second word (which OCR was unable to read) is then shown to other users. The program compares all the answers, and, having typed enough confirmations, can recognize the word with a high degree of confidence.

Thus, the program solves two problems at once: it checks whether the user is a person, and digitizes words that the OCR system could not recognize, adding them to the general body of knowledge.

Google and reCAPTCHA

In 2009, Google saw the potential and acquired reCAPTCHA for use in the Google Books project. Google used reCAPTCHA to get people to recognize words or characters that their image processing algorithms couldn't identify, thus making the process much easier.

Project Google Books was an ambitious initiative to digitize all the books in the world and create a huge digital library accessible to everyone. According to Wikipedia, by October 2019, Google had scanned more than 40 million books. However, the project faced a number of legal issues related to copyright, which made it difficult to implement.

Problems with reCAPTCHA

As the system became more secure, attackers became more creative, fueling the constant evolution of CAPTCHA.

A 2014 Google study showed that modern artificial intelligence technologies can solve even the most distorted texts with 99.8% accuracy, and numbers in images with 90% accuracy. This made processing visual data an unreliable verification method. It was necessary to find a new approach.

NoCAPTCHA reCAPTCHA

Then the revolutionary API that we use today appeared – NoCAPTCHA reCAPTCHA. This is the same simple checkbox that we talked about at the beginning. All you need to do is simply check the box and you can continue working.

How does NoCAPTCHA reCAPTCHA work?

In fact, it is much more difficult than it seems. NoCAPTCHA is powered by an advanced risk analysis API that continuously monitors user behavior. The system analyzes all interactions with CAPTCHA: cursor movement before clicking on the checkbox, during verification and after you check the box. The combination of these actions determines whether the user is a human.

Why does NoCAPTCHA reCAPTCHA work?

The basic idea is that automated malicious bots use pre-written scripts to perform functions. If a bot tries to “slide” through and check a box, it will simply perform the programmed action without the natural cursor movement of a human.

This way, the NoCAPTCHA reCAPTCHA program can determine whether the function was performed manually or through a script.

However, even this method is not completely safe. There are programs that can simulate mouse clicks and automatically check a checkbox. Therefore, Google may also take into account other data that users provide unintentionally, such as IP addresses And cookieswhich helps prove that you are human.

Although Google does not disclose to us all the exact methods for identifying bots (we’ll leave that to them).

What if that's not enough?

Even with such a sophisticated security system, uncertainty can remain. With this in mind, Google has added an extra step to check when the system is not sure of the result – image identification.

When in doubt, the program can ask the user to prove their humanity using an old-style CAPTCHA (texts and numbers) on desktop computers or an image CAPTCHA on mobile devices.

There is also a form expiration timer running in the background to prevent bots from solving the CAPTCHA after a long time.

Next CAPTCHA Innovations

Innovation in this area does not stop. We started with a text CAPTCHA and ended up with a simple checkbox, adapting after each failure.

Every CAPTCHA failure leads to the development of artificial intelligence. Why? Because in order for the test to fail, someone had to come up with new ways for the computer to solve the test.

This stimulates further development and the emergence of new types of CAPTCHA.

One such innovation is the Honeypot Method.

How does the Honeypot method work?

He uses deceptionto force bots to reveal themselves. When we create a form, automated programs are likely to fill out all the fields. A person will fill only the fields visible to him.

What if we add fields that will invisible for users, but will be present in the form?

Bots will also fill in these hidden fields, thereby giving themselves away.

Honeypot method – double benefit

The Honeypot method works like this: double-edged sword: It simplifies the verification process for users and effectively catches malicious bots. Simplicity for humans and a trap for bots is what makes this method effective in fighting spam.

What is the future of cybersecurity?

However, as we discussed earlier, everything that is created can be hacked – if the enemy has the motivation to do so.

CAPTCHAs were originally developed to protect against spam bots. But today, bots don't stop there – they attack servers, steal data and commit fraud. The emergence of threats such as CAPTCHA factories (organized groups of people solving CAPTCHAs for payment) and smarter AI bots are calling into question the effectiveness of CAPTCHAs.

CAPTCHAs – Are They Still Effective?

CAPTCHAs are often used as “speed bumps” There are obstacles in the way of hackers that slow down attacks, but do not stop them completely. Bots are getting smarter, and technologies like machine learning can easily bypass even the most complex CAPTCHAs. On the other hand, making CAPTCHA more complex can lead to customer irritation. No one wants to waste time guessing pictures or typing random texts when simply registering or logging in.

This is clearly not a long-term solution, and increasing the difficulty of the CAPTCHA only slightly delays attackers.

New direction in cybersecurity

When thinking about the future, it is important to look new solutions. Online businesses must invest in technology that can effectively detect bots while still providing a seamless user experience.

There is a possibility that new solutions are already being developed. Perhaps in a few years we will see a completely different approach to online security that will go beyond CAPTCHA.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *