Enterprise Dominoes. 0x13 Harmful Tips for a Ninja Developer

Almost any enterprise system (by which we mean some software where users work constantly throughout the whole working day) in the modern world seeks to grow together with the business it runs into a highly loaded web solution like ours VLSI

It is understandable: accessibility from any device with a browser, minimum investment “at the start” – everything that the business loves so much. But with the development of the system, not only its size grows, but also complexity of the solution architecture, and with it – and the cost of any error that immediately causes a cascade of possible problems and the “domino effect”.

When, where and how can they be summoned by a hidden ninja developer?

Growing architecture

We start growing our solution with classic three-link:

  • client, he is a browser

  • server, aka business logic, aka BL

  • base, she … the base is the base

In the simplest initial version the logic of operation and display are not separated in any way, exist in a single code space (apparently, the ninja has already had a hand) in, no offense to the language itself, the classic PHP-style:

Let’s separate the logic of presentation and logic of work. This can be either a logical selection of display templates in the code, or a physical transfer of such static data to a resource independent of the BL:

Problem: blocking business logic

Now our ninja is ready to strike the first blow: write code so slow that all business logic was blocked entirely – for example, containing an infinite loop or constant memory consumption. Even if such a request blocks only one process on the BL, the impatient user, without waiting for an answer, will “pull” it again and again until everything is blocked:

Look at how beautifully all the areas “reddened” where the failure “played” – not a single client gets anything, the ninja is happy, the sabotage was a success!

But we will add some dispatcher (for example, based on nginx), which knows how to “shoot” a specific node, if it stopped responding, and themselves we will make several nodes:

Problem: monolithic base

The cunning ninja noticed that the deeper the layer he strikes, the greater the effect. In our current schema, such a convenient place is the database. Let’s get ahead of him dividing the data according to some applied criteria – for example, operational current data and statistical reports:

In fact, at this point we have already received a service architecture. How “micro” it will be is still a question, but services are already getting some applied “color differentiation of pants”.

Problem: tight service interdependency

We look at the diagram above – and we see that in the event of problems on service Y, service X will also begin to suffer, although in most cases it could not. For example, if after placing an order we need to send it to the client by email, a synchronous call is not needed here at all.

Therefore, we can break some of the call chains and “spread” the peak load in time, making them asynchronous using the service-to-service bus:

Problem: DDoS to the base

If you used PostgreSQL as a database for your solution (more features than MySQL, almost the same enterprise as Oracle, reliable, and even free!), Then sooner or later you begin to understand that working in web systems with a lot of short-lived connections he is uncomfortable. After all, each connection to PG is a separate process on the DBMS server, which allocates at least 8MB of memory to itself at startup.

To mitigate these negative effects, smart people have come up with compound pullers – for example, pgbouncer or Odysseus… They accept many connections at the input, and at the output they cross them into a small number of constantly active connections with PostgreSQL.

As a result, our architecture turns into something like this:

Here we additionally taught the interservice bus to communicate with the client to report operational events from the server side, and also drove the presentation logic “under the dispatcher”.

Ninja joins the fight

So, in terms of user interaction, we have the following chain:

  • browser

  • dispatcher

  • business logic

  • service bus

  • connection puller

  • database

The more “lower” link in this chain can be “knocked out”, the more users will suffer. But let’s start at the very top …

“Dropping” the browser

There are (at least!) Three simple methods to make Chrome blunt:

  1. Make it pump out a lot of traffic

  2. Feed him bold JS file… Since the JS broadcast is still in the same single thread as its execution, then “let the whole world wait”.

  3. Make it generate tons of HTTP requests… Even if they are all cached for you, then somewhere around the 700th request Chrome “breaks” and starts giving files from the disk cache for several seconds.

TODO # 01: Use minification.

TODO # 02: Optimize the amount of code.

TODO # 03: Apply batching. And make sure it is actually used.

“Drop” the dispatcher

How can you “squeeze” the dispatcher? Basically, only a lot of requests and / or traffic in a short time.

To achieve this effect, you already need to strain – sync a large number of client browsers, because it is not possible to load significantly from one machine.

The best thing for this is …

  • send out some event “at all” users

  • hang on to his coming “instant” processing

  • whatever from each tab

  • … and request something from the server volumetric and incompressible (pictures, for example) or dynamically compressible

TODO # 04: Cache headers as much as possible so that at least some of the requests to the server do not reach the server

TODO # 05: Check the response headers that the dispatcher does not try to “compress incompressible” and stupid on this – binaries or obviously small answers.

TODO # 06: Do handling an event from a single tab, serving the rest of the response via localStorage.

TODO # 07: Do randomized delay within a few seconds from the moment the event arrives at the client to the request to the server.

“Dropping” business logic

Exhaustion of resources

If your BL is tailored for sequential synchronous processing of each incoming request in a separate process / thread (ex: Apache, IIS), then enough take them allto start accumulating the queue and increasing the execution time.

If your BL is asynchronous (ex: NodeJS), then you should increase the impact, driving the event loop of the process into a clinch something like an infinite loop or multiple string operations.

For this, the technique from the previous paragraph, which generates a cloud of requests, is very useful to us.

TODO # 08: Use different options rate-limiter – better directly on the dispatcher.

Applied deadlock

An effective option would also be arrange deadlock – send your methods to synchronously execute methods of a third-party service, and from them – again in the original service!

TODO # 09: Simulate the possibility of the occurrence of call chains A -> B -> A. Found – get rid of or weight with “hard” timeouts within hundreds of milliseconds.

“Dropping” the service bus

Data animation

Generate “on input” relatively few messages with many recipients everyone has it. Inside they will “multiply”, and if they do not “break” when synchronizing the nodes, then the output channel will “fall”. Didn’t lie down – add recipients.

TODO # 0A: Do you need a common bus here? Consider setting up a specialized base station to exchange such events.

Mailbombing

We generate a large number of voluminous messages to another BC. Here it will definitely “break” on the synchronization of the nodes.

TODO # 0B: Transfer not the content itself, but a link to it. Or see # 0A.

“Dropping” the connection pool

Once upon a time there was pgbouncer in transaction mode… Why exactly like this? Because it is this mode that allows, in most cases, the best way to utilize the connection to the database.

And this means that for each individual transaction there is a new mapping of the “client” connection to the “server” one.

That is, if do not “wrap” in a transaction, a method with a cloud of small requests to the database, each of them will be perceived as a separate transaction, processed independently on a separate connection, and the costs of this processing will be significantly higher than the execution time of each SQL query.

TODO # 0C: Wrap your methods in a transaction as a whole, if this does not contradict the application.

TODO # 0D: Use session mode and prepared statements

Auto-generated queries / data

It is best used in combination with an ORM, for which it is a pleasure to issue request body for a couple of megabytes or IN (...) with hundreds of thousands of identifiers freshly read from the database. For a puller, this means a waste of resources on shifting bytes between sockets.

TODO # 0E: You shouldn’t use an ORM until you understand in detail how it works.

“Dropping” the database

A real pleasure for the connoisseur.

DDoS

“And let’s make X in our method based on simultaneously in several streams! “ And in several BL processes, up to the heap … and without a reasonable limit on the number of processes-threads.

Precisely, exactly it is necessary at the same time? In such a quantity? And # 6 and # 7 – isn’t it your option? ..

“Okay, we’ll have no more than Y concurrent requests …” Yeah, but everyone reads gigabytes of data from the database cache – this is where memory bandwidth ends …

TODO # 0F: Use response caching on the BL side.

TODO # 10: Optimize your queries already! See article Recipes for ailing SQL queries

Locks

Well, then I’ll block it now! .. advisory locks not just given to me! … and thousands of them!

TODO # 11: pg_try_advisory_xact_lock – upon completion, transactions are canceled automatically. See article Fantastic advisory locks and where to find them

I need change field / roll index right now!“and the company ALTER TABLE

TODO # 12: See article DBA: when serial almost ran out

TODO # 13: In general, the activity of various “ninja developers” on the base should always be monitored and analyzed. How do we do it in “Tensore” can be found in the articles “Monitoring the PostgreSQL database – who is to blame and what to do” and “Bulk optimization of PostgreSQL queries”.


In general, check often that your ninjas are still working for you.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *