How not to fix concurrency problems

– What is your forte?

– Multithreading

– Here are 3 puzzles. Can you do it by tomorrow?

– I can’t do so many tasks at the same time

Agree concurrency – one of the hardest topics in programming. At the beginning of one’s programmer career, one tries in every possible way to avoid diving into this topic, but sooner or later one has to deal with concurrency problems. This can happen because you need to write thread-safe code, or a bug has landed on existing code.

Due to inexperience, some may simply mask the problem, which will later make itself felt again. Some of these approaches are described below. But keep in mind, this is just a workaround and doesn’t really fix the problem. concurrency.

The article is not about a silver bullet, how to fix concurrency problems, but about crutch approaches of cheerful and resourceful ones that do not solve the essence of the problem.

Sleep – instead of a thousand words.

The fact is that Sleep is added not only by novice programmers. I met programmers who, with 3-5 years of experience, made such a fix both in code tests and in production. Code like this is inherently a time bomb. It is important to understand one thing: if you make this kind of fisk, it can be masked for the time being. Then the operating system will be updated or you will change the hardware to a new one and that’s it, the timings will change and the old problems will become apparent.

Note, sleep time increase – Same no way out 🙂

The “logs, more logs” approach masks the concurrency problem.

Sometimes it takes a long time to find the root of the problem, so programmers in practice often add logs to further study the problem, which is correct. You can add some logs – this is really useful and helps to solve the problem. But, sometimes, they add log messages through the line. You have a small piece of code there, lined with a bunch of logs, and the code miraculously stops falling. Then, after some time, the history of the bug comes up, and it turns out that it has not been reproduced for some time. Therefore, the bug is closed due to the reason “It has not been reproduced for N months, it may have already been fixed”. In fact, the bug remains.

Let’s see why this “works” so far. In places where logs are added, the code starts to run slower and creates the illusion that there is no out of sync and everything “worked”, but nothing like that. One must always remember that input/output operations are expensive in terms of system resources.

Try-catch as a fix.

Wrapping everything in a try-catch construct instead of a “normal” fix is ​​also not an option. I’ve only seen one conscious use of this approach in production.

Here is how it was. There has been non-thread-safe code for many years that has been used in various parts of the project. From time to time, crash dumps appeared due to the lack of thread synchronization in the code, which many used to be thread safe. Over time, the code became obsolete, it began to be replaced in projects, and this non-thread-safe code received the status “obsolete” and a candidate for deletion. And now the code is already found in only one place, crash dumps naturally continue to appear. Tickets for the problem accumulate, and managers ask to fix the problem with the condition of minimal time costs. Rewriting thousands of lines of code to synchronize obsolete code that will be removed in 2-3 months is pointless, it will only waste your time. The solution to the “cheap and cheerful” category is to wrap the corresponding methods in a try-catch. Everyone understands that this is a clumsy solution to the problem, but nothing can be done if it is desirable to fix it today. In fact, this is not even a fix, but a masking crutch. You can’t do this in supported code.

Let’s repeat N times until we win.

It happens that some part of the combat code or the test crashes periodically. It is not possible to quickly find the problem, and the problem is not entirely obvious. Here comes the brilliant idea to wrap the code in a try-catch, but not just catch exceptions, but repeat the falling code several times until the end. This approach is reminiscent of Russian roulette: shoot – no shoot. The result of code execution seems to improve: the number of crashes decreases. This approach is only temporary. After all, the real problem is not solved, and at some point the frequency of the problem will increase again, and you return to where you started.

Whatever one may say, with different approaches to masking the concurrency problem, it will still have to be solved.

Similar Posts

One Comment

  1. If you just translate the article, please always add a link to the original one. Best regards, the auther of the article.

Leave a Reply

Your email address will not be published. Required fields are marked *