Delayed durability will help your ORM achieve 50% or more performance improvements if you only use…

So if the barrier is removed, you can increase the load without consequences.

Horizontal scaling comes and goes

Here I will increase the number of background jobs to 75 , but I did 100 as well. It’s just that with 75 background jobs, the application server load looks like this

And then cluster 1C will simply queue background jobs.

To evenly distribute background tasks between two or more 1C application servers, you need to use the requirements mechanism with parameters as described here Assigning specific background jobs to a specific production server this is a more complex option that needs to be programmed.

So we start 75 background jobs

… result

2138 vs 1459 object records per second, it’s on 47% more!

We look at delays

They grow only for writing to the database, but according to the log they are NOT. Super means you can click more in the following series and find out the limit of what is possible on this equipment.

An attentive reader will ask – how long will 75 background jobs have without delayed durability? See below formally 1797 i.e. 23% growth, but at the same time, wait on writelog is already limiting and a further increase in load will simply queue background jobs and there will be no true parallelism

Below are comparisons of measurements for 75 background jobs with Delayed durability and for 50 background jobs WITHOUT Delayed durability to understand if there are any bottlenecks for further increase in load.

Note that the Batch request (purple) line has become flat with Delayed durability and no obvious dips, Disk transfers \sec (blue) is about the same, and CPU usage (red) has increased

Without Delayed durability, request processing is more erratic, even under less load with obvious dips

The picture of the queues on the SSD (disk queue length – beige) has not changed (here is a graph for 75 background jobs) with Delayed durability enabled, it should be so – after all, writing to the log and writing to the database are different subsystems in SQL

But the LogFlush graph is very interesting

Log Flush waits (purple) – Delayed durability disappeared when enabled. Log Flushes\sec (blue) became smaller even under higher load. Those. the bottleneck is gone

Without Delayed durability, the picture is sadder even on fewer threads

As you can see, when using horizontal scaling, Delayed durability allows you to push the limits of performance significantly. But of course this is not the best option, which we will discuss below.

Delayed durability and delayed sanctions effect

Given the well-known events, Russia will have to forget about new licensed installations of MS SQL Server or Oracle Database. Legal software is not a machine that can be imported from a friendly country. Now even some HP RAID levels require special license keys to be entered. Whether Postgres will be a full-fledged import substitution remains to be seen, but for now it’s right to do a regular image of servers so as not to have problems with reinstalling and re-registering software.

If you look strategically, for serious tasks where control over the system is required, I see 4 options (can someone suggest more?)

  1. Develop alternative open source DBMSs such as Postgres.

  2. Conduct gray import of code, not software. If you read the history of the Soviet missile defense The birth of the Soviet missile defense system. Yuditsky is building a supercomputer (topwar.ru) , in the USSR it has always been possible to make specialized machines, on unconventional principles, and these technologies are still being protected. But for the mass level, dual-use products are needed, and this is possible only with state participation, as they do in the USA. History of Soviet IT https://topwar.ru/user/Sperry/ reads like a novel, but even with a superficial glance it is clear that the “catch up and overtake” cycle requires, first of all, social changes (What the USSR could not do), which will allow it to be technically effective. When it will be ? – definitely not in this solar cycle

  3. Using data centers and clouds in friendly countries is a normal option for businesses that need proven solutions yesterday, but the risks are clear.

  4. Squeeze the maximum out of the existing system before a bright future.

The last point can be mentioned in more detail. A well-designed system with enough room to scale can live long enough. Just look how many old systems there are in the US American paradox: why the US has an outdated IT infrastructure, but Biden’s plan makes sense | Forbes.ru works so far.

System designed for horizontal scaling My tongue is my enemy. Architect about the future 1C, can work for quite a long time by adding new nodes, and this has been tested in practice. However, in the case of 1C, the DBMS will be the bottleneck, and especially for write operations.

With reading, everything is easier – no matter what the DBMS is, due to replication, you can allocate additional nodes / offload the system for complex reporting and similar work. With write load balancing, this trick won’t work if you don’t use an Oracle real application cluster . Therefore, now the only way to try to optimize the operation of the transactional logging subsystem is to remove the time scaling restrictions.

Then there are two options: either 1C optimizes its code for larger DML and the boundaries will move back, or alternative solutions to the Oracle real application cluster will appear on the horizon, because import substitution is necessary not only for Russia, but also for China and other countries that want to maintain technological independence in modern realities.

Who does not take risks does not make a backup

Since a delayed transaction (asynchronous commit) implies the possibility of data loss on failure even for a committed transaction, you need to be prepared for surprises. In the official documentation, no one indicates the amount of lost information and the consequences. Of course, it all depends on the load, server configuration, etc.

For our test, I can only indirectly estimate . The graph shows the number of Log Flash in bytes / sec. On average, somewhere around 20 megabytes per second.

And there are Log Flashes \ sec on average 500 per second. If divided, 20MB/500 =
40kb which is close to the Flash block size, cm
interesting article IO Block Size for SQL Server, Disk
Block Size – Pure Storage

Those. if there is an unfortunate failure and we lose 0.1 seconds, that’s 2 megabytes. We may not violate ACID integrity in the database, but in ORM 1C there is also integrity at the object level. For example, the “Document” object is often associated with the “Registers of information” or “Registers of accumulation” objects, we can load / change them in different transactions, and it may happen that the information registers are updated, but the document is not loaded or updated (the transaction is formally in Commit , but did not make it to disk).

Next are the options:

  • If the document was not created for 1C, it will be a broken link in the information register and reloading the document will not cure it, only a special procedure for removing broken links. For a 5 terabyte base, using standard methods, it takes a long time to look for such problems.

  • If the document has not been updated, then there will be a logical mismatch between the data in the registers and the document. This is even harder to find.

Thus, you need to develop your own integrity check procedures, for example, processing markers for a group of ORM objects (something similar to ACID but for objects), or lay down on mandatory backup recovery in case of failure.

Stopping Instance should also go in a special way Stop rules with Delayed durability.

“For delayed durability, there is no difference between an unexpected shutdown and an expected shutdown/restart of SQL Server. Like catastrophic events, you should plan for data loss. In a planned shutdown/restart, some transactions that have not been written to disk may be saved to disk before shutdown, but you should not plan on it. Plan as though a shutdown/restart, whether planned or unplanned, loses the data the same as a catastrophic event.”

those. sp_flush_log is required.

To summarize – Delayed durability makes sense to include when you can increase the load and processing speed through horizontal scaling, a high degree of parallelism of background jobs. This is a good practice that allows the architecture to live for many years, even if the code itself is not quite optimal. Without that, the 11% gain from just enabling Delayed durability isn’t worth the potential data loss on crashes and more complex administration. And with horizontal scaling, you have every opportunity to make a gain of more than 50%, as the bottleneck expands.

Optimizing the ORM code for larger DMLs will yield more gains. Why do I assume so? First, separate tests on the network. Secondly, I ran a test with 100 background tasks and it is already visible there that they began to appear SQL Server LOGBUFFER Wait (sqlskills.com) . It is very likely that there are already 2 1C application servers in the current cluster on small DMLs, which will already create problems with DML processing on the DBMS server (not even with writing to disk). After all, when DML follows a cached plan and is not recompiled, there are still overheads for parameter binding, execution, etc. Place your bets – which is the first to give up on SQL Server – CPU or IOPS limit on SSD? 1C without limits on our channel t.me/Chat1CUnlimited

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *