Threadpool Curse of Dotnet Daemons on Linux

Everyone has heard that sometimes dotnet on Linux consumes more resources than on Windows. Sometimes this difference is almost unnoticeable. But it also happens that the same application consumes 2-3 times more CPU on Linux than on Windows.

An artistic digression, a joke on a current topic. To find out the most interesting, reading the text under the cut is absolutely not necessary.

In the alien environment of the Immaterium, almost everything behaves unusually. Many laws of nature do not work. More precisely, they do not work as everyone is accustomed to. Even time behaves somewhat differently.

Magos Technicus G was immersed in studying how to make familiar algorithms and mechanisms work in the aggressive environment of the Warp at least acceptable to his usual understanding.

Only the High Tech-Priests have access to the ancient tablets of perf, containing arcane and forbidden spells. Only those hardened by centuries of intellectual effort directed towards the glory of the Omnissiah are prepared to use them. For to understand the spells one must delve into the depths of one's mind, touch the Warp and not let it consume oneself.

After a week of deep meditation, Magos G had an idea: it was necessary to change the rotation limit of the unfair semaphore.

There are many reasons for the difference in CPU consumption by dotnet applications under different OS, and they are all varied. In short, the implementation of a large number of primitives or even large pieces of logic differs. And in each specific case, something different may play a role.

Some of these differences (or even “problems”) are eventually fixed, improved, if possible from the side of dotnet itself. For some “features” in certain versions of dotnet there are experimental features that can be managed independently. Sometimes these features move to new versions enabled by default.

It is not surprising that most of the degradations in dotnet performance on Linux are around asynchronous work, around the threadpool. At some point, dotnet developers even rewrote the threadpool code from native to managed-C# code, so that it would at least try to be similar under different OS. But the basic primitives for asynchronous work are still very different in different OS – even the set of asynchronous methods in the API of operating systems differs. Not every async method is actually honest and asynchronous in all OS.

Let's describe the situation

There is a type of application that often “does nothing”. They are waiting for something, ready to start executing the work as soon as it appears. They need to start executing it as quickly as possible. And there is no pre-determined schedule for the occurrence of such tasks. At the same time, the pattern of occurrence of this work is explosive – if it appears, then a lot at once, for a bunch of threads.

Why are we interested in them? Because there are a lot of them. For example, in our company, in the test environment in one test system, there are about 600 such demons. We will deal with them further.

We are also interested in them because the transition from Windows to Linux increased the total resource consumption slightly more than twofold.

What was the CPU spent on?

Unfortunately, no widely available and popular method of diagnosing CPU consumption worked in this case. All tools showed that “all CPU was spent somewhere in the threadpool”. Sometimes going down to the maximum of such specifics: PortableThreadPool.WorkerThread.WorkerThreadStart(). This was not enough.

A tool came to the rescue perf. You will easily find how to use it. And with difficulty, but still be able to analyze any complex application.

We won't go into details of the artifacts study. But everything pointed to the fact that the CPU is spent in SpinWait-s inside the semaphore.

What is SpinWait?

The documentation from the Microsoft website answers this question perfectly:

SpinWait is a lightweight synchronization type that you can use in low-level scenarios to avoid the expensive context switches and kernel transitions that are required for kernel events.

How does SpinWait work in simple terms? It simply burns a few CPU cycles with some useless work, approximately equal to several tens of nanoseconds in time.

It can be assumed that the authors of the threadpool consider the work in the method WorkerThreadStart inside the semaphore taken is very short. And if the semaphore is currently occupied by someone, then it is very likely that you need to skip just a few processor cycles, and the semaphore will become empty. And this should be much cheaper and faster than falling into a real Wait. Because a real Wait will do a thread yield – that is, do a context switch and return the thread to the thread scheduler. And this is a very expensive and lengthy operation. Usually much more expensive than a couple of skipped processor cycles.

Why is it more expensive on Linux?

Who the hell knows. It just works differently, that's all. Not worse, not better – different. And ideally, dotnet's behavior, its use of such primitives, should be configured in different ways depending on the OS.

What are we going to do?

Judging by codethe semaphore is configured to be limited to 70 SpinWait iterations by default. And – lo and behold – this value is configurable by an environment variable!

What happens if we decrease this number? For example, write 0 there?

We register the environment variable DOTNET_ThreadPool_UnfairSemaphoreSpinLimit=0to all 600+ instances. We release, look at the graphs of total CPU consumption:

Is it time to rejoice?

Success? Should we run and set this environment variable for all dotnet applications on Linux? No way.

Theorizing what could go wrong

What bad things can happen from what we never do now? SpinWaitbut always fall into a fair Wait? This can cause the threadpool throughput to drop sharply.

It is easy to imagine that you have a regular and stable flow of emerging and very quickly executing Tasks in your application. And a thread pool in the method WorkerThreadStart often comes across a busy traffic light, waits a little while SpinWait-e, waits for the semaphore, takes the task, and goes to execute it. The ratio of useless work (SpinWait) to useful work is minimal. The “idle” time (spent not executing useful work) is minimal.

And if SpinWait-s are absent (or their number is small and it is not enough to wait for the semaphore), we can in theory often fall into an honest Wait and make a context switch. This will take up a lot of our time and the ratio of the “idle” time (spent not performing useful work) will grow in relation to the time spent on useful work. The shorter the Tasks are and the more of them there are, the worse this ratio will be.

Therefore, it is highly recommended not to thoughtlessly touch the variable. DOTNET_ThreadPool_UnfairSemaphoreSpinLimit. Assess all the risks first, carefully study how it will affect your application, carefully observe over a long period of time.

Conclusions

  • ThreadPool is an incredibly complex abstraction that makes writing “multithreaded” code easy and effortless. But sometimes it comes at a huge cost.

  • Even the ThreadPool developers can't write it to be perfect in all corner cases. It works “imperfectly” in some special situations.

  • If your application starts consuming significantly more CPU when switching from Windows to Linux, you can play with the environment variable DOTNET_ThreadPool_UnfairSemaphoreSpinLimitputting numbers from 0 to whatever value you want, looking back at the default in the ThreadPool code.

  • True, this will not help every application. And for many, it will probably even hinder. After all, such defaults were chosen for a reason – they must be good “on average”.

  • This feature that can be influenced, like the environment variable that can be changed, are not the only ones available to us at the moment.

  • In each subsequent version of dotnet everything can change completely, and the environment variable can stop being used. Read the changelog.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *