The legitimate traffic on the DDoS-Guard network has recently exceeded one hundred gigabits per second. Now 50% of all our traffic is generated by customer web services. These are many tens of thousands of domains, very different and in most cases requiring an individual approach.
Under the cut – how we manage front nodes and issue SSL certificates for hundreds of thousands of sites.
Setting up a front for one site, even a very large one, is easy. We take nginx or haproxy or lighttpd, configure according to the guides and forget. If you need to change something – do reload and forget again.
Everything changes when you process large volumes of traffic on the fly, evaluate the legitimacy of requests, compress and cache user content, and at the same time change the settings several times per second. The user wants to see the result on all external nodes immediately after he changed the settings in his personal account. And the user can download through the API several thousand (and sometimes tens of thousands) domains with individual parameters for processing traffic. All this, too, should work right away in America, Europe, and Asia – the task is not the most trivial, given that in Moscow alone there are several physically spaced filtering nodes.
Why are there many great reliable sites around the world?
Quality of customer traffic service – requests from the USA should be processed in the USA (including for attacks, parsing and other anomalies), and not pulled to Moscow or Europe, unpredictably increasing the processing delay.
Attacking traffic must be localized – transit operators can degrade during attacks, the volume of which often exceeds 1Tbps. Transporting attack traffic via transatlantic or trans-Asian links is not a good idea. We had real cases when Tier-1 operators said: “The volumes of attacks that you take are dangerous for us.” That is why we accept incoming flows as close to their sources as possible.
Strict requirements for service continuity – cleaning centers should not depend on each other, nor on local events of our rapidly changing world. Disconnected for a week all 11 floors of MMTS-9? – no problem. Not a single client who does not have a physical inclusion in this location will be affected, and web services will not be affected under any circumstances.
How to manage all this?
Service configurations should be distributed as quickly as possible (ideally instantly) on all front nodes. You can’t just take and rebuild text configs and reboot daemons on each change – the same nginx keeps worker shutting down processes for a few more minutes (and maybe hours if there are long websocket sessions).
When rebooting the nginx configuration, the following picture is quite normal:
Old workers eat memory, including that which does not depend linearly on the number of connections – this is normal. When client connections close, this memory will be freed.
Why was this not a problem when nginx was just starting to develop? There was no HTTP / 2, no WebSocket, no massive long keep-alive connections. 70% of our web traffic is HTTP / 2, and these are very long connections.
The solution is simple – do not use nginx, do not manage fronts based on text files, and certainly do not drive zipped text configurations over trans-Pacific channels. Channels, of course, guaranteed and redundant, but from this no less transcontinental.
We have our own front-server-balancer, about the insides of which I will discuss in the following articles. The main thing that he can do is apply thousands of configuration changes per second on the fly, without restarts, reloads, spasmodic growth in memory consumption, and all of that. This is very similar to Hot Code Reload, for example in Erlang. The data is stored in a geo-distributed key-value database and is immediately read by the actuators of the fronts. Those. You have uploaded the SSL certificate via the web interface or API in Moscow, and after a few seconds it is ready to work at our cleaning center in Los Angeles. If a world war suddenly occurs and the Internet disappears all over the world, our nodes will continue to work autonomously and repair the split-brain, as soon as one of the dedicated channels Los Angeles-Amsterdam-Moscow, Moscow-Amsterdam-Gong-Kong-Los- becomes available Angeles or at least one of the backup GRE overlay.
The same mechanism allows us to instantly issue and renew Let’s Encrypt certificates. Very simplified, it works like this:
As soon as we see at least one HTTPS request for our client’s domain without a certificate (or with an expired certificate), the external node that accepted the request informs the internal certification authority about this.
If the user does not prohibit the issuance of Let’s Encrypt, the certification authority generates a CSR, receives a confirmation token from LE and sends it to all fronts via an encrypted channel. Now any node can confirm the validation request from LE.
In a few moments we will receive the correct certificate and private key and will send it to the fronts in the same way. Again without rebooting the daemons
7 days before the expiration date, the procedure for receiving the certificate is initiated
Right now we are in realtime rotating 350k certificates completely transparent to users.
In the following articles of the series I’ll talk about other features of real-time processing of large web traffic – for example, analysis of RTT using incomplete data to improve the quality of service for transit clients and generally about protecting transit traffic from terabit attacks, about the delivery and aggregation of traffic information, about WAF, almost unlimited CDN and many mechanisms for optimizing the return of content.