comparing solutions to the problem with constructing a thread pool in C and Go

netpoll.

Touching the essence netpollit is worth mentioning the scheduler in Golang and executable entities – goroutines.

C programmers are accustomed to operating with the concept of a thread (aka thread/LWP) as a minimal executable entity within the language. With Golang, the situation is more interesting: another level of executable entities appears – goroutines. Like threads, they have their own execution context. At first glance, it seems that they are not much different from streams, but upon closer examination, cardinal differences are visible.

The work of goroutines is controlled by the Golang scheduler, part of the language runtime. Goroutines themselves do not have an abstraction at the OS level; they are the minimum unit of execution. From the OS scheduler's point of view, goroutines execute in the context of threads.

The Golang scheduler addresses the following issues:

  • on top of which OS thread the goroutine will run,

  • when you need to give the goroutine a quantum of processor time,

  • and when to stop execution and put it in a waiting queue.

Very similar to how the scheduler works in the OS kernel, only everything works in user space.

Interestingly, when the application starts, the default runtime of the language already creates a pool of threads on which goroutines will be executed (this behavior can be controlled and changed before starting the application and during operation).

Creating a goroutine is a more lightweight operation that does not involve the OS scheduler. That is, for example, you can easily do without creating a pool of goroutines – simply create a new one each time when there are incoming requests from the outside. After all, for the goroutines to work, the scheduler has already created a pool of its threads. Many frameworks in Golang are built on this idea.

Let's return to our solution. From the point of view of the general approach, the logic of our control flow is very similar to netpoll. This is a separate thread that handles waiting, particularly for incoming connections. It uses the most advanced multiplex call available at the moment epoll. It operates on a set of file descriptors from goroutines, including those that are waiting for new clients to connect. When events occur on file descriptors (sockets), the corresponding goroutines are awakened by the language runtime scheduler.

Same with additional eventfdhandle in our stream netpoll in its set of processed descriptors it contains one special one – for the internal logic of operation. For example, a newly created goroutine calls the method accept at net.Listener, but there are no incoming connections from clients at the moment. In this situation it is necessary to inform netpoll about the need to add one more descriptor to its set to track events on it. To interrupt epoll and reconfiguration netpoll This service file descriptor is just used.

More details about the basic operating logic and device netpoll in Linux you can find it in file.

Mutexes, or how they can be used “abnormally”

We will need mutexts to build a more lightweight thread pool.

Developers are accustomed to using mutexes to block access to critical data in order to avoid race conditions.

How it works in C

The API provided by glibc for C allows not only the acquisition and release of a lock, but also the operation of checking whether a lock is occupied. This is a fairly important detail that is rarely used. Quite a lot of copies have been broken on this topic in endless holivars about the need to rethink the entire approach to architecture at the first attempt to use functions like trylock.

Let's look at a small example related to the use of synchronization based on trylockin C:

struct pool_args {
    pthread_mutex_t lock;
    int stop_ev;
    int shared_data;
};

const unsigned int workers_count = 4;
const useconds_t worker_sleep_usec = 500000;
const unsigned int control_sleep_sec = 3;
 
void *pool_worker(void *arg) {
    uint64_t ev;
    struct pool_args *args = (struct pool_args *)arg;
    // подразумеваем, что read может вернуть либо количество прочитанных байт,
    // либо ошибку и errno = EAGAIN как индикацию того, что событие не было инициировано
    while ((read(args->stop_ev, &ev, sizeof(ev)) < 0) && (errno == EAGAIN)) {
        // пример работы с trylock: при неуспешной попытке захвата мьютекса вместо
        // ожидания применяем стратегию поллинга, данный код активно потребляет процессорное
        // время и близок по семантике к spinlock'у
        if (pthread_mutex_trylock(&(args->lock)) != 0) {
            continue;
        }
        // имитация полезной нагрузки
        (void)usleep(worker_sleep_usec);
        args->shared_data++;
        (void)pthread_mutex_unlock(&(args->lock));
    }
    return NULL;
}

int control_thread(void *arg) {
    // подразумеваем, что все системные и библиотечные вызовы успешны
    struct pool_args args = {
        .lock = PTHREAD_MUTEX_INITIALIZER,
        .stop_ev = eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK | EFD_SEMAPHORE),
        .shared_data = 0
    };
    unsigned int iter = 0;
    pthread_t ids[workers_count];

    for (; iter < workers_count; iter++) {
        pthread_create(&ids[iter], NULL, pool_worker, (void *)&args);
    }

    // имитация работающего процесса
    (void)sleep(control_sleep_sec);

    // останавливаем наши фоновые потоки
    (void)eventfd_write(args.stop_ev, workers_count);

    // ждем их завершения во избежание утечки памяти
    for (iter = 0; iter < workers_count; iter++) {
        (void)pthread_join(ids[iter], NULL);
    }

    // освобождаем ресурсы
    (void)close(args.stop_ev);

    return NULL;
}

In this example you can see the general ideology of using the method trylock: if the mutex cannot be acquired, we simply go to a new iteration of the loop. For simplicity, we used only one access resource here; in more complex cases there may be several. As an extension of the idea, let's use this method to determine which thread in the pool is not currently busy.

Let us describe the general concept of how a thread works in a pool: when receiving a task after waking up from sleep, the thread seizes “its” mutex and holds it throughout the entire processing of the request. It lets go only at the very end – and so on cyclically throughout the entire duration of work.

The controlling thread, in turn, when a new request arrives, calls trylock can quickly determine a free thread in the pool. Since each thread in the pool has its own mutex, which the thread holds while processing the request, when trying to call trylock nothing will happen to the captured mutex – we will not receive the mutex. If successful and the mutex is captured, we are guaranteed to find a thread waiting for new work.

This approach additionally solves the problem of race with access to service data between the control thread and a thread from the pool in the situation of delegating the next request.

Let's write some of the source code in C using the described approach:

struct pool_thread_info {
    int job_ev;           // уникальный идентификатор дескриптора входящих событий
    int sock_fd;          // дескриптор подключенного клиента
    pthread_mutex_t lock; // мьютекс для определения статуса потока и синхронизации данных
};

const unsigned int workers_count = 4;
const int magic = 42;

int pick_thread(struct pool_thread_info *info) {
    // в зависимости от занятости потока возвращаем true (1) или false (0)
    // код функции рассмотрим позже
    int ret = pthread_mutex_trylock(&(info->lock));
    if (ret == 0) {
        // нам удалось выбрать поток
        return 1;
    } else {
        // рассматриваем случай EBUSY
        return 0;
    }
}

void wake_thread(struct pool_thread_info *info, int client_sock) {
    info->sock_fd = client_sock;
    (void)eventfd_write(info->job_ev, 1);
    (void)pthread_mutex_unlock(&(info->lock));
}

void *control_thread(void *arg) {
    // структуры данных для потоков в пуле
    struct pool_thread_info threads[workers_count];
    // дескриптор слушающего серверного сокета
    int srv_sock = -1;

    // подразумеваем, что управляющий поток уже проделал необходимую работу для старта пула:
    // 1. Дескрипторы событий созданы и разложены по структурам потоков
    // 2. Потоки пула созданы, каждому потоку передана структура с его дескрипторами и прочими данными
    // 3. Серверный сокет был успешно забинжен

    while (magic) {
        int client_sock = accept4(srv_sock, NULL, NULL, SOCK_CLOEXEC);
        unsigned int iter = 0;

        for (; iter < workers_count; iter++) {
            if (pick_thread(&threads[iter])) {
                wake_thread(&threads[iter], client_sock);
            }
        }
    }
    (void)pthread_exit(NULL);
}

void *pool_thread(void *arg) {
    struct pool_thread_info *info = (struct pool_thread_info *)arg;
    eventfd_t ev;

    while (magic) {
        // после поступления сигнала на обработку блокируем мьютекс, это позволяет достигнуть сразу двух целей
        // 1. Исходя из логики, получаем эксклюзивный доступ к атрибутам потока, например, к дескриптору соединения клиента
        // 2. В данном состоянии, исходя из логики, «сообщаем» управляющему потоку о том, что мы заняты в данный момент и не готовы принять новый запрос 
        (void)eventfd_read(info->job_ev, &ev);
        (void)pthread_mutex_lock(&(info->lock));
            // обрабатываем данные с переданного дескриптора
        (void)pthread_mutex_unlock(&(info->lock));
    }
    (void)pthread_exit(NULL);
}

How it works in Golang

In Golang, mutexes are available out of the box as part of the package sync. They work in a similar way, the paradigm is unchanged – mutually exclusive locking.

But there are also differences: one day in practice we came across one detail: compared to C-like languages, it is impossible to create a recursive mutex in Golang. This feature shoots Win32 API users in the foot: CRITICAL_SECTION by default it is immediately recursive; attempts to use a similar principle in Golang will immediately lead to a deadlock.

Here is an example that demonstrates this feature:

func foo(v *sync.Mutex) {
    fmt.Println("on lock")
    v.Lock()
    defer v.Unlock()
    foo(v)
}

func main() {
    v := sync.Mutex{}
    foo(&v)
}

The Mutex type has a TryLock method, which was added relatively recently by language standards – in version 1.18. At the same time, the method has its own complicated history, which you can familiarize yourself with Here.

Problem solved within C using trylock and mutex, is resolved much more gracefully in Golang using channels. They are “thread-safe” by default and can be used on their own for synchronization: multiple goroutines can perform concurrent reads from a channel without a race condition. And free goroutines from the pool will, as if by waving a magic wand, line up for work themselves.

What to do if we have not found a single thread in the pool to process the task

When moving on to the theory, we deviated from solving one of the questions raised at the beginning of the article: what to do if, when trying to process an incoming connection, there was not a single free thread in the pool. In other words, all threads are busy and the request cannot be processed at the moment.

How it works in C

There may be several potential solutions for a given problem, let's look at and analyze them.

The first and easiest option is to use an infinite loop waiting for one of the threads to be released and constantly checking their status. Behind the obviousness of this solution lies the main disadvantage of the approach: under high loads and constantly busy threads in the pool, the main thread will consume a lot of CPU time.

The second option is deferred request processing. We simply place the request that has come for execution into an internal queue and complete the initial part of the work. Next, when processing new incoming requests, we will check the queue and, if we find an existing request in it, we will process it first. The method qualitatively solves the problem described in the first option – now the main thread does not waste CPU time. However, two new problems were generated:

  • from this moment we begin to consume memory to save the request queue,

  • if the time gap between two adjacent requests is large enough, then the unprocessed request will wait all this time.

The first problem can be easily resolved by limiting the total amount of memory used to store requests in the queue – you can simply use a ring queue. The second problem is more difficult, since we cannot predict the frequency of requests. However, we can empirically guess how much time is spent processing a request and use that as the basis for the third option.

The third option is to add another synchronization primitive to the previous solution. Since there is no guarantee that the next request will arrive quickly enough, let's use a timer with a quantum equal to the time it takes for the thread in the pool to process the request. For example, let's take a quantum of 10 milliseconds. If the request was not initially processed, we put it in a queue and sleep it for the specified period, assuming that one of the background kernel threads will be freed during this time. Then we try again to find a free flow. If we don’t find it, we start the timer again, fall asleep and continue in a circle.

If the next request arrives before the timer expires, we will try to find a free thread. In case of failure, we will go back to using the timer, having previously rescheduled it for a later date relative to the current time point.

Why Golang wouldn't have to solve this problem

In Golang, this problem is often solved differently – even without using a thread pool. The main idea is that creating and scheduling a goroutine with a runtime scheduler is a much easier operation compared to traditional work with threads.

As I wrote above, when processing a new incoming request, it is easier to create a new goroutine and entrust processing of the request to it. Instead of writing an algorithm to find a free goroutine from the pool. This approach is used in many frameworks written for Golang – for example, the HTTP server Echo. Each time it processes an incoming connection, it simply creates a new separate goroutine and then processes the http request in it.

Sometimes there are cases where the use of a goroutine pool may be motivated by a specific task, but in the vast majority of cases it is the mentioned mechanism that has worked well. Due to the ease of development and good scalability of the solution with the creation of new goroutines within the language runtime.

The results of our “comparative analysis”

Using even the simplest mechanisms in the form of basic synchronization primitives provided by the operating system and the standard library of the C language, it is possible to design and create multi-threaded applications that are quite flexible in terms of functionality. However, this comes at an additional cost: sometimes you have to mix “manually” and control different entities – for example, multiplexing and synchronization. It is necessary to take into account the various subtleties and nuances of the objects used to avoid potential errors. In multi-threaded applications, this issue can be especially noticeable.

On the other hand, using a higher-level Go language can significantly simplify development by hiding many layers of abstraction from the programmer and transferring them inside the standard library and runtime of the language. But here, too, there may be potential difficulties: given the abundance and simplicity available, it is important to understand the internal structure of the objects and algorithms of the standard library, especially for writing high-load code. Otherwise, you may encounter non-trivial problems that require an integrated approach to debugging and profiling.

As a result, on the one hand, we get simplicity of methods, hiding detailed implementation and various abstractions underneath. On the other hand, there are additional nuances that are easy to overlook, although taking them into account can result in a more reliable solution. An example is the use of channels in Golang: the convenience of their use is offset by additional difficulties and conditions of use for the correct operation of the application.

However, despite the potential difficulties when using Golang, the language itself is relatively simple and has a lower entry barrier than C. It is also worth mentioning that this compiled language is initially built on the idea of ​​​​multithreading. This allows it to consistently remain a niche language in the development of microservice architecture for busy systems, where speed of execution is important and other languages ​​are not always suitable – for example, Python.

The general idea that can be used to complete the text is that the same problem can be solved in different programming languages. Sometimes the speed of developing a solution will depend on the choice of language (we will not develop this topic, because the article is more about the functionality of language syntax). But, as with everything, it is important to understand the nuances and be aware of the pitfalls. For example, an experienced C programmer will solve the same problem with thread distribution better than a novice Go enthusiast. Despite the fact that Go as a language can be considered the most suitable for the solution. Obviously, the choice of language strongly depends on the task at hand and the level of knowledge of the specialists working in the company.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *