Multithreading Ambushes

In this article I will describe my most recent and vivid impressions of multithreaded programming. These are my impressions, my experience and I will be glad if it is useful to other programmers.

In the last article, I argued that more serious problems await them in the case of interacting threads. But it is one thing to assume or even predict these problems, and another thing to face them directly.

What was predicted came true, as they say, in full. I think that the problems voiced below will not be news to anyone, but there will be those who are unaware of them, just as I was unaware. That’s why I wanted to record them and share what I had to face. Well, and tell me how I got out of it, finding myself in situations that were not entirely familiar to me (in automatic programming, I emphasize, they would not have arisen in principle).

So, having previously created a flow test (for more details, see [1]), driving it many times and in different modes, I noticed that, albeit rarely, incorrect results pop up. In such cases, I usually blame myself. And in this case, even more so, because, to hide it, I have very little experience using threads.

But during the experiments, problems were revealed that are difficult to attribute to lack of experience. Some things were overcome right away, some things had to be tinkered with, but there were also problems that could not be corrected, even with quite a lot of experience in programming in general. The latter is not about experience, of course, but about problems, and we will talk further… And not even about problems, but about rather unexpected and unexpected “ambushes” that arose on the way to mastering multithreading.

Access to a shared resource

There were no signs of trouble when, with an increase in the number of threads and a sufficient duration of the test, from time to time the value of the common counter variable (hereinafter simply the counter) ceased to be correct. For example, with ten threads and the number of cycles for each thread equal to a million (see also the original source of the problem in question [2]) instead of ten million, the result was several units less. The errors, however, were rare in time, negligible in significance, and did not cause any particular concern. Everything was attributed to a lack of experience working with threads and in the hope of subsequent normalization of the test.

When it became impossible to ignore the errors, a similar test was revived, but created outside the VKP environment. This eliminated its possible influence. Although, according to the underlying logic, the test code was not tied to the VKP code. Only its dialogue and visual capabilities were used. And suddenly, having worked correctly before, it completely broke down. The results were similar to testing without synchronization, although the test used a mutex. This happened after seemingly cosmetic changes to the flow code, which in this case was simply reduced to a similar code in the VKPa, essentially differing little from its original version. But if in VKP there were only rare errors, then in the analogue – every time the test was launched.

It became difficult to try to roll back the code to the old version after its changes. But since the old test was archived in advance, after gradual and careful changes to it (a kind of “dancing with a tambourine”) it was determined that the problem was in the initialization of the shared resource (see the commented test and the same one, but only higher, in Listing 1). In the archived version of the test it was cleared before creating threads, in the new one – after. And then, “like a flash of bright light,” everything suddenly became clear… 🙂

The launched thread (see the start() method of the thread class) “instantly” starts working and, apparently, manages to work out a certain number of cycles before the next threads start. Including until the counter is initialized (let's return to Listing 1). Thus, at the time of reset
resource threads will spend a certain number of their cycles and will have a different number of them in relation to the moment the counter is reset (see Listing 2). At the same time, when the counter is reset before the threads are created, they change it, joining the work sequentially, and this does not affect the final result. With the new test, when the resource was reset after threads were created, the situation is the opposite, because For a certain number of cycles, the threads will work virtually uselessly. That's why the old test works correctly, but the new one doesn't. It is difficult to assume otherwise, because just simply moving the counter initialization to another location results in
flawless test performance.

Listing 1. Code for creating and running threads

void MainWindow::Process() {
    bIfLoad = false;
    ui->lineEditMaxValue->setText(QString::number(nMaxValue).toStdString().c_str());
    ui->lineEditNumberOfThreads->setText(QString::number(pVarNumberOfThreads).toStdString().c_str());
    pCSetVarThread = new CSetVarThread();
    int i;
    for (i=0; i<pVarNumberOfThreads; i++) {
        CVarThread var;
        var.pQThread = new ThCounter(&nMaxValue, this);
        string str = QString::number(i).toStdString();
        var.strName = str;
        pCSetVarThread->Add(var);
    }

    timeLotThreads.start();
    pVarExtrCounter = 0;			// инициализация счетчика
    TIteratorCVarThread next= pCSetVarThread->pArray->begin();
    while (next!=pCSetVarThread->pArray->end()) {
         CVarThread var= *next;
         var.pQThread->start(QThread::Priority(0));
         pCSetVarThread->nActive++;
         next++;
    }
//    pVarExtrCounter = 0;		// инициализация счетчика
    bIfLoad = true;
    ui->lineEditTime->setText("");
    ui->lineEditCounters->setText("");
}

Listing 2. Flow code

void ThCounter::run() {
//    while (!pFThCounter->bIfLoad);
    string str;
    int n=0;
    while (n<nMaxValue && bIfRun ) {
        bool bSm = pFThCounter->pIfSemaphoreLotThreads;
        bool bMx = pFThCounter->pIfMutexLotThreads;
        if (bSm || bMx) {
            if (bSm) pFThCounter->AddCounterSem();
            else {
                pFThCounter->AddCounterMx();
            }
        }
        else pFThCounter->AddCounter();
        n++;
    }
    pFThCounter->pCSetVarThread->nActive--;
    if (pFThCounter->pCSetVarThread->nActive==0) {
        pFThCounter->ViewConter(pFThCounter->pVarExtrCounter);
        pFThCounter->ViewTime();
    }
}

What conclusion can be drawn based on such results? When working with threads, you need to constantly take into account the parallel work they begin – and this is important! – at the point of its launch. This parallelism is externally hidden, which often leads, as it did in my case, to a misconception about how the program works. But the most interesting thing happened next…

Access to a shared resource

The test worked, threads flowed into work as they were created, and the result was consistently correct. Perhaps there was one thing that didn't suit me – the moment when the threads started working. I wanted them to start working with the counter simultaneously. But how to do this if the launch of threads is asynchronous in nature, because their launch code is strictly sequential and cannot be different for now. One of the obvious solutions is to introduce a global flag that allows them to work with the resource. And, by the way, such a flag would simultaneously solve the problem of counter initialization.

This is very easy to do. To do this, just add one line to the beginning of the thread code, where it will wait until the flag is set (see Listing 2).

Entered a line. It works without any issues under the debugger. Outside – the work begins, sometimes, somehow clumsily, and then the test generally hangs rigidly. However, if you specify a timeout for waiting for the thread to complete, the application will terminate on its own, issuing Fail Fast. The situation is like a programmer's nightmare. Simply because it is somehow impossible to “fix” the error by debugging (there seems to be no error itself), and it is problematic to understand where it is hidden only by the behavior of the program… And what to do in such a situation?

Think…

An application can only hang if the thread does not terminate. There seem to be no other options, because… everything else behaved perfectly before this – introducing the cycle.. That is. he is perhaps the only candidate against whom one can make claims. The solution is to break it by inserting an exit from it based on the value of the thread's work flag – bIfRun. In other words, we reduce this cycle to the form:

Listing 3. Adding a loop to the beginning of a stream

    while (!pFThCounter->bIfLoad) {
        if (!bIfRun) break;         // выход, если поток зависнет
    }

The test stopped hanging. Ugh! – you can exhale. You guessed it. Now at least the test can be restarted with the Reset (VCPa) button. This is already something, but there is little joy, because… the test still doesn't work. Failure?! We think further… The following solution suggests itself: since there is a problem with the cycle, we will act more harshly – we will remove it completely. Is it possible, for example, to move the check inside the thread loop? Let's try it by transforming the main thread loop code to look like this:

Listing 4. Eliminating the outer loop

    int n=0;
    while (n<nMaxValue && bIfRun ) {
        if (pFThCounter->bIfLoad) {
	...
            n++;
        }
    }
    pFThCounter->DecrementActive();

And – oh, miracle! – the test worked (see also the video for the article, link at the end of the article). How and with what to explain this? Don't know. Just a double ambush. One is the added loop problem. How can it become eternal if the flag is explicitly set? But, judging by the reaction to the flag inside him, the infection hangs?! Another problem is the non-equivalence of equivalent transformations. Make a claim against C++? Maybe. But this is my opinion as an amateur in compiler design. Those. digging in this direction is, as they say, more expensive for yourself. Let the experts in this matter judge. In the end, there is a simple project that confirms this “ambush” and, as it were, eliminates it. True, all this is little consolation. I personally will just avoid such constructions in the stream for now.

conclusions

On such ~~sad~~ note – a hard hanging test, as expected, the article would be completed… But, fortunately, a solution was found. I talked about it above. Of course, everything can be explained. Including the solution found. It’s not that complicated and the word wouldn’t be a problem at all if it was original (but then we wouldn’t have found out about the ambush?!).

But who will explain to me how it is possible to allow some code to manifest itself differently in different program design modes – debugging and release? This situation is perhaps more annoying than errors and/or incorrect test performance. Although threads are threads, the problem may not be in the code at all, but in the behavior of the threads. But then, how to deal with this (just what)?

Flow is an unpredictable and capricious partner. In a group of similar “subjects” his shortcomings only multiply. When communicating with each other – repeatedly. It is very difficult to control such a team (remember the beginning of the article [3]). The theory convinces and even proves that this is hardly possible at all. In relation to streams, of course.

So choose the right partners! Reliable, faithful, predictable. Well, you probably already understand where I’m going… However, in any case, when choosing, think with your own head, and don’t be fooled by advertising. Even if it's promising. And now it’s beyond measure. I, for example, am waiting for artificial intelligence from writing articles (about the same threads) and political speeches (aka Volfovich) to begin qualitatively translating technical documentation. Do you think I'll wait, huh? This is very important for me. And then there were advice to read, supposedly, the documentation…

Well, I wrote this article, I swear, myself. And even the code – without resorting to ChatGPT services. And I really want… freebies! 😉 I wonder what “generative intelligence” (one is tempted to say – degenerative) will say about automatic programming if asked such a question? Someone ask… Or, by the way, maybe he can give some useful advice about the latest ambush? While my intellect is completely disabled here… I mean, it’s broken 🙁

Link to video – https://youtu.be/1-skz0PT0Zk

Literature:

1. All the secrets of multithreading. [Электронный ресурс], Access mode: https://habr.com/ru/articles/818903/ free. Language rus. (access date 06/11/2024).

2. Multithreading in Python: the obvious and the incredible. [Электронный ресурс], Access mode: https://habr.com/ru/articles/764420/ free. Language rus. (access date 06/11/2024).

3. Structured concurrency in the Go language. [Электронный ресурс], Access mode: https://habr.com/ru/hubs/parallel_programming/articles/ free. Language rus. (access date 06/11/2024).