CAPTCHA: killing conversion

CAPTCHA is considered the international standard for protection against DDoS attacks, automatic registrations and spam. We at Variti analyzed the effectiveness of this solution and came to the conclusion that this is a very inconvenient and ineffective means of protection against bots that has a bad effect on conversion, and areas with captcha are in themselves vulnerabilities for attacks.

We decided to share the reasons why captcha should be disposed of in favor of more reliable and less annoying user solutions, into marketing and technical ones.

Marketing

Infuriates!

Captcha must be carefully examined and periodically introduced several times. Stanford Study says that their subjects spent on average 9.8 seconds to recognize and introduce visual captcha and 28.4 seconds on the audio version, with 50% of users refusing to solve it. In 2018, the Baymard Institute, which conducts various studies on the topic of UX, countedthat users cannot solve textual CAPTCHAs in about 8% of cases. This figure increases to 29% if the CAPTCHA is case sensitive.

First of all, this is still a usability problem, since this functionality forces the user to perform an unnecessary action (and plus a captcha, this is not always appropriate and looks beautiful in the page design). This problem is especially clearly manifested if, if the solution is entered incorrectly, the entire page reloads: for example, if the user typed a long comment for a long time, and then he disappeared if the solution was incorrect. The percentage of probability that a person will start all over again is not very large.

In addition, there are already several solutions for creating captcha on the market that place ads in it (for example, they offer to assemble a puzzle from the company logo). This cannot but affect the degree of mood of the user.

Finally, it is very inconvenient for people with impaired coordination or vision problems, and even for those who do not distinguish colors, because not all resource owners implementing visual captcha add sound to it. Plus, captcha is especially annoying for the “age” audience and the one where there is a large percentage of people with a low level of computer literacy or lack of knowledge of English.

Poor conversion

As you know, in general, any extra field for filling on the site worsens the conversion. Here interesting study, which showed that the rejection of captcha leads to an increase in conversion by 3.2%. Each resource can test the exact data on the change in conversion depending on the captcha independently, because the results depend on the specifics and audience. But if you approach the problem from the point of view of lost profit, then you need to calculate the costs and effectiveness in both cases – is it much more profitable to include captcha than to get rid of spam by other means? Moreover, they are.

CAPTCHAs have become harder

Over the years, CAPTCHA has become smarter, but bots have begun to grow more rapidly and become more sophisticated. In the early 2000s, simple images with text were enough to stop most spam bots, but every year the texts have to be distorted more and more to overtake character recognition programs. You yourself may notice that in the captcha, where you need to select several images, after several unsuccessful attempts, the objects to search are hidden or distorted, new classes of objects are added and the number of pages to go through increases. Accordingly, with complication, the number of failures of real users also increases. Of course, Google solves its additional tasks, using these algorithms to teach its robots how to recognize objects in images and is unlikely to refuse them, but so far everything looks as if everything that the captcha does is eliminates not-so-smart bots and inattentive people .

Back in 2014 Google bleed each other has its best algorithm for solving the most distorted texts and people: the computer correctly recognized the text in 99.8% of cases, and people in only 33%.

Technical

Easy to get around captcha

CAPTCHA does not fulfill its main function – it does not relieve resource owners of bots. There are even more than one option for the “fight” of spammers with captcha.

Recognition Systems and Neural Networks

OCR (Optical Character Recognition) systems now work quite accurately and easily recognize both printed text and images. The decision to add a “noise” background, extra color and lines, to distort or duplicate the text does not particularly help prevent this, but complicates the passage for a real person.

With the development of machine learning technologies and deep learning neural networks, the further process of visual complication of captchas looks futile. A full-convolutional neural network in which an image is input and the desired image is output, or several images (center maps) recognize text captcha in most cases. However, for it, captcha is also solved with the choice of the right pictures for the detection and classification of objects – after all, this is exactly what the neural network is doing (including the very reCAPCHA neural network from Google). Yes, and some libraries that allow working with neural networks are also developed at Google (for example, Tensorflow)

Exist hacking servicesat which the audio version of captcha is taken and transcribed. With the successful development of voice recognition systems, this also ceases to be a problem for experienced spammers. There are algorithms and scripts, such as, for example, the Kok-Yanger-Kasami algorithm for recognizing a two-dimensional grammar, which can recognize more than 50% captcha. There are other ways to bypass validation:

Number generators and other enumeration systems. For example, if there is the same set of 10 pictures that are simply rearranged randomly, and you need to find something specific on them, that is, only 1024 possible variations
Character recovery from log data
“Peeping” into scripts to call captcha, for example,
Reapply user session identifiers
Finally, spammers connect the latest FineReader type recognizers to their self-learning spam bots.

Mystery business

There is a whole market of services offering to bypass captcha, and it is very cheap. Thousands of real people are employed in this industry – residents of India or China, who pass tests for a small fee. Special type exchanges Amazon Mechanical Turk They offer to buy dozens of unraveled captchas for a few cents, and numerous services also constantly bring down this price. They constantly create thousands of new “clean” accounts in thousands, which are the easiest and quickest to check spam systems on sites.

Finally, there are online resources with “interesting” content such as games or adult content. Before users can see the next batch of content, the system will make a backend request to Yahoo or Google, grab the captcha from there and slip it into the user. And as soon as the user answers the question, the hacker will send the unraveled captcha to the target site. It is not difficult to make a popular site with popular content if you parse (or simply steal) interesting content from a number of “legal” portals (we often encounter such “copy pasteors” in our work). And the hacker as a result gets a large audience who unravels other people’s captcha, not suspecting it.

Doesn’t distinguish between good and bad bots

In addition to bad bots, there are good ones – these are robots of search engines and browsers, useful corporate bots of various services that search or post information or offer help to a user by automating the technical support of a company or selling its services. For example, according to Globaldots, at the moment, human traffic is 62.1%, bad bots 20.4%, and good bots 17.5% (that is, lagging behind the bad ones is not so critical). Unfortunately, the CAPTCHA method does not distinguish between good and bad bots, not skipping everybody equally, although “good” bots could be useful.

Resource for Attacks

Most captchas are third-party – provided by the same Google or developers of captcha solutions. But in many cases, they are generated by the same server on which the site is located, and then this becomes a vulnerable place for attacks.

The generation of some types of captcha is a rather resource-consuming operation and it is not fast, as it requires requests from third-party libraries and generally works with images. If default caching is not provided or is disabled for some reason, this is even more trouble. If the attacker sets the task to create an excessive number of requests for captcha generation, then the server may not have time to do this.

However, this problem is solved:

You need to choose a certain type of captcha that is deprived of this problem
Place captcha on a separate resource

The only question is whether the site owner has the resources to hire a developer who will do it in a quality manner.

Slows down the site

A slight slowdown may not seem like a big deal, but you will be wrong if you do not pay attention to it. Look at this study: while a fifth of marketers do not think that loading time affects the conversion rate, almost 70% of people admit that page speed affects the likelihood of a purchase.

How can captcha affect speed?

Complex image generation is quite a resource-intensive operation, given that not all the codes shown are used. Therefore, captcha services and related logs and cookies can slow down the online resource.
Checking the code and the key is carried out by the backend, where it may be difficult to transfer large files. One-time links also require checking at the backend level, creating an extra load. Captcha can loop and litter the backend, and then you need to create a mechanism for caching unused images to be able to display them to other users.
In addition, many captcha services have an inconvenient API for both the captcha widget and the server, and the developer will also have to suffer with this.

It’s all?

Unfortunately no. There are a few more points.

Firstly, captcha I can break the logic of the site – especially in cases where filling out the form ends with captcha, and the user is not always warned about this. However, the option “show captcha only at the entrance” does not solve the problem of protection against spammers, because it turns out that after a one-time passage they can do whatever they want further.

Secondly, let’s think about search engines. If search engines “whitewash” by user-agent, then captcha is inefficient. If captcha is shown to everyone, then it may seem to search engines, and the site will have problems with indexing.

Not a captcha single

There are many other forms of protection, sometimes even more effective against bots. For example, on a front end, it can be the minimum time for filling out a form, less than which only a bot can fill, or a hidden field (display: none) that a person will not see but fill the bot.

At the network level, this can be obfuscation or HTML encryption, blocking of certain user-agents and various traps from the side of the web server: for example, creating invisible sections of the site, where only robots fall and are later banned by IP, or filtering anonymous proxies.

And finally, there is a method that we apply in Variti – This is a complete filtering of traffic, which we consider the only full-fledged approach in protecting against bots and DDoS attacks. We pass all the traffic that goes to the client’s website or application through our clusters, and specially tuned and self-learning algorithms determine and pass further legitimate traffic from live users and “good” bots, and IP blocking is also not required in this process. However, we will talk about why we also consider the IP blocking method to be malicious in the following articles.