We protect the site from parsers and behavioral bots using a DNS proxy

Hello! I am Gregory, and for the last 10 years I have been creating bots. However, I have now developed KillBot, which blocks bots and protects sites from their influence.

In the article I will tell you how to set up an intermediate proxy server to check the user for a bot before entering the site, I will also tell you how to clearly identify simple http bots and high-level JS.

The antibot should detect two types of bots:

There are at least three technical approaches to detect them:

  1. Installing a JS script on a website that analyzes JS to identify a bot;

  2. Using a PHP preprocessor to analyze HTTP traffic before loading the site;

  3. Hiding the site behind a proxy server similar to Cloudflare;

The first method does not protect against parsing by simple HTTP bots. The second method helps protect against HTTP bots, but does not detect bots that support JavaScript. The third option is able to protect against bots of any type, since the traffic first passes through an intermediate server and then reaches the site. The latter method also prevents vulnerability scanning, since the first interaction occurs with the backend server, and not with the site itself.

How to identify simple bots that pretend to be Yandex, but are not?

This is easy to do – just resolve the domain to which the bot’s IP refers.

IP bots of search engines and other popular services are always linked to their head domains. Host examplefrom which the bot runs YaDirectFetcher Yandex: 213-180-203-16.spider.yandex.com.

Below is an example of php code that checks that the domain is linked to Yandex:

function checkIfTrueYandexBot($user_agent, $client_ip) {    
    if ((stripos($user_agent, 'bot') !== false) && (stripos($user_agent, 'yandex') !== false) ) {
        $hostname = gethostbyaddr($client_ip);    
        if ($hostname !== $client_ip && $hostname) {
            $yandex_domains = ['yandex.ru', 'yandex.net', 'yandex.com'];
            foreach ($yandex_domains as $domain) {
                if (substr($hostname, -strlen($domain)) === $domain) {
                    return true;
                }
            }
        }
        return false;
    }
    return true;
}

Thus, to protect against simple HTTP parsing, you can create your own list of services whose bots can access the site, and deny all others. You also need to create a white list of callback servers with which the site interacts (notifications about successful payment, for example). This approach will completely protect against http parsing and will not cause any harm.

How to block behavioral JS bots?

First, I'll convey the idea of ​​identifying bots.

Let's say there is a visit to a website with a mobile safari user agent. But, by analyzing the parameters, I see that in fact this is a desktop with a screen resolution of 1920:1080.

Can I tell you this is a bot?
no, I can’t – perhaps some user is simply using an extension to change the user agent.

What if out of 4000 sessions 3000 have such a pattern?
yes, in this case I can say that all (almost all) 3000 visits are bots. In this case, you can either block such entries or show a captcha (what if one of them is real).

Therefore, if the browser engine does not belong to a real browser (as in the example above), and the number of visits from it is statistically significant, then it is a bot.

Modern bots do not allow such obvious inconsistencies as in the example above. I just conveyed the idea. But, nevertheless, KillBot can build a stable browser impression for all modern bots.

Example. Here: https://www.youtube.com/@Berzserkk/videos You can buy a bot program written in BAS that uses behavioral data. This bot program has its own main cast: 3083615002, and if I see visits with such a cast in the statistics of my site, it means that these are visits from the bot program of Comrade Dronov.

Below is a screenshot that says that there were 31,879 sessions from the bot program with snapshot 3083615002 (each of these sessions is unique: with its own user agent and other parameters), but snapshot 3083615002 is not among real browsers and the number of visits is very large (2nd in account from above). So this is a bot program:

Interface for managing bots of the KillBot system, an example of blocking a bot from which there are many visits

Interface for managing bots of the KillBot system, an example of blocking a bot from which there are many visits

How can I configure DNS traffic proxying myself, similar to what CloudFlare does?

DNS proxying of traffic is when the user first gets to an intermediate proxy server, where the visit is analyzed and after the analysis a decision is made to allow the visitor to continue to the site or not.

Setting up a JS user verification page for a bot yourself

Here we will create a verification page – this is the botcheck.html file. Before entering the site, the user will be redirected to botcheck.html, located on the intermediary server, and if the check is successful, then the traffic will be proxied to the protected site, if the check is not successful, then the user can be blocked from entering, shown a captcha, or redirected to another site: those. do any action.

I show an example of setting up a proxy server in a combination of Ubuntu + Apache.

  1. Install apache and enable proxy modules:

sudo apt install apache2
sudo a2enmod proxy
sudo a2enmod proxy_http

We create a configuration file for Apache: it will check whether the user needs to be redirected to the botcheck.html verification page or whether access to the site can be allowed:

sudo nano /etc/apache2/sites-available/kill-bot.net.conf
<VirtualHost *:80>
    ServerName kill-bot.net
    ServerAlias www.kill-bot.net


    RewriteEngine On
    # Если кука не равна 1, то проверяем на бота на botcheck.php
    RewriteCond %{HTTP_COOKIE} !botcheck=1
    RewriteRule ^(.*)$ /botcheck.html [L]

    ProxyPreserveHost On
    ProxyPass / http://234.234.234.234/
    ProxyPassReverse / http://234.234.234.234/

</VirtualHost>

In the example above: if the cookie is botcheck!=1, then we redirect to the botcheck.html page (it is located on the proxy server itself) – this page must implement all the verification logic and set a cookie for access to the site. IP = “234.234.234.234” must be replaced with the real IP of your site – proxying will be carried out to it if the botcheck cookie = 1.

Connecting such an intermediary proxy server to the site is carried out by replacing the DNS A record of the main site with the IP of the proxy server itself

Example botcheck.html file:

<html>
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Проверка пользователя...</title>
    <script>
        function setBotCheckCookie() {            
            var date = new Date();
            date.setTime(date.getTime() + (365 * 24 * 60 * 60 * 1000));
            var expires = "; expires=" + date.toUTCString();
            document.cookie = "botcheck=1" + expires + "; path=/";                        
            window.location.href = window.location.href;
        }
    </script>
</head>
<body>
    <h1>Подтвердите что вы не бот</h1>
    <button onclick="setBotCheckCookie()">Я не бот</button>
</body>
</html>

The botcheck.html page, when you click on the “I’m not a bot” button, sets a cookie using JS – such a simple check will not allow bots without JS support and those who did not click on the “I’m not a bot” button onto the site.


Above I demonstrated a simple proxy server setup to demonstrate how it works. To this logic you need to add a check for http bots, as I described above, analysis of the JS engine, the ability to display captcha, etc. Also, the cookie value must be made dynamic so that it cannot be stolen.

If you don’t want to understand setting up all this

Killbot has a ready-made script that will deploy a ready-made server for proxying traffic with blocking http bots according to the scheme that I described above, with KillBot’s integrated protection from high-level JS bots – i.e. All this can be integrated into your solution, you can add your own code, customize it, etc. – i.e. all the delights of API access.

Building a unique UserID without using cookies

This is the last post in my telegram channel: https://t.me/KillBotRus. UserID will be the same regardless of incognito mode. And one of the next posts will be dedicated to digital personal identification. Therefore, go to my telegram channel so as not to miss this interesting material.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *