Red and black signals in node.js

Hello again, dear readers! In this article, I will continue to explore the intricacies of the implementation of node.js mechanisms. in his previous article I looked at how timers work. This time we will talk about one of the mechanisms of interprocess communication, namely, signals.

Getting ready to honk

I will consider how signals work in node.js within the Linux operating system, because I use it on a daily basis and understand it better than others. And also, in the case of Linux, signals are supported at the kernel level. We will trace what happens from the moment the corresponding functions are called at the JavaScript level until the moment the system calls are triggered. The JavaScript specification does not provide for working with signals, therefore, just like timers, signals must be supported at the level of the environment in which JavaScript will work. Signals are implemented in node.js at the level of the libuv library. When examining signals, I decided to follow the same approach as with timers: I again took original libuvonly this time I threw out everything except the signals. The result was a simplified analogue of libuvwhich only supports signals.

How signal handlers are stored

In node.js, you can set multiple handlers for the same signal. And if you look at it from the operating system's point of view, then using the sigaction(2) system call, which allows you to set handlers for signals, you can set only one handler for each signal. We also know that if a signal comes to a node.js process, all handlers that were set for the signal are triggered. So where are all these handlers stored and where do they come from?

And here again we come across data structures. In order to solve the problem of storing signal handlers, we can, of course, use different data structures, but the efficiency of working with this structure will depend on which structure we choose. Tree data structures cope well with the task of searching for an element in a structure. For example, binary search tree allows you to find elements in O(log N). But in such a tree, the efficiency of searching depends on the order in which elements are added. You can add elements in such a way that the tree will be skewed and, in the worst case, will simply turn into a linked list, and the complexity of searching will actually be O(n), despite the fact that this is a search tree. Therefore, it is necessary to maintain the tree in a balanced state – simply put, it should be as branched as possible, and not skewed. The more branched the tree, the smaller its height. Accordingly, the fewer steps will need to be taken from the root to the sought element, which means the search will be faster. There are various balancing algorithms, and you need to choose one for your task. You need to decide, for example, how often the insertion and search of elements occurs. In my opinion, in the case of signals, it is difficult to say unequivocally what happens more often, search or insertion. You can write an application in which signal handlers are added once, and then signals are constantly sent, and the application does not terminate. But, on the other hand, in some application, signal handlers will be frequently added and removed, and when a signal is received, after the handlers are triggered, the process will be terminated. Therefore, it is necessary to proceed from the fact that search, insertion and removal will occur with approximately the same frequency. A data structure such as red-black tree.

Experiment

Let's add several handlers for different signals and see what kind of tree we get 🙂 First, I suggest writing the code in javascript, and then immediately switch to similar code in C, where the corresponding libuv functions will be called.

JavaScript code:

import process from "node:process";

// Begin reading from stdin so the process does not exit.
process.stdin.resume();

console.log("PID", process.pid);

// SIGUSR1
process.on("SIGUSR1", () => {
 console.log("[1] SIGUSR1 received");
});

process.on("SIGUSR1", () => {
 console.log("[2] SIGUSR1 received");
});

process.on("SIGUSR1", () => {
 console.log("[3] SIGUSR1 received");
});

// SIGUSR2
process.on("SIGUSR2", () => {
 console.log("[1] SIGUSR2 received");
});

process.on("SIGUSR2", () => {
 console.log("[2] SIGUSR2 received");
});

process.on("SIGUSR2", () => {
 console.log("[3] SIGUSR2 received");
});

// SIGINT
process.on("SIGINT", () => {
 console.log("[1] SIGINT received");
});

process.on("SIGINT", () => {
 console.log("[2] SIGINT received");
});

process.on("SIGINT", () => {
 console.log("[3] SIGINT received");
});

We install 3 handlers for the SIGUSR1, SIGUSR2 and SIGINT signals respectively. If you run this script and send the corresponding signals to the process, the handlers for these signals will be called in the order in which they were added. I want to draw attention to the fact that when we add handlers, they immediately stop being tracked by the event loop: the unref method is implicitly called on the signal descriptor in code node.js itself. That's why, to give the event loop something to live for, at the very beginning we call:

process.stdin.resume();

And here is a similar code in C:

#include <stdio.h>
#include <unistd.h>
#include <uv.h>

/* SIGUSR1 */
void sigusr1_handler_1(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[1] SIGUSR1 received\n");
}

void sigusr1_handler_2(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[2] SIGUSR1 received\n");
}

void sigusr1_handler_3(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[3] SIGUSR1 received\n");
}

/* SIGUSR2 */
void sigusr2_handler_1(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[1] SIGUSR2 received\n");
}

void sigusr2_handler_2(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[2] SIGUSR2 received\n");
}

void sigusr2_handler_3(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[3] SIGUSR2 received\n");
}

/* SIGINT */
void sigint_handler_1(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[1] SIGINT received\n");
}

void sigint_handler_2(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[2] SIGINT received\n");
}

void sigint_handler_3(uv_signal_t *handle, int signum)
{
   /* UNSAFE: This handler uses non-async-signal-safe function printf() */
   printf("[3] SIGINT received\n");
}

int main()
{
   uv_loop_t loop;
   uv_signal_t sigusr1_1, sigusr1_2, sigusr1_3;
   uv_signal_t sigusr2_1, sigusr2_2, sigusr2_3;
   uv_signal_t sigint_1, sigint_2, sigint_3;

   printf("PID %d\n", getpid());

   uv_loop_init(&loop);
 
   /* SIGUSR1 */
   uv_signal_init(&loop, &sigusr1_1);
   uv_signal_start(&sigusr1_1, sigusr1_handler_1, SIGUSR1);

   uv_signal_init(&loop, &sigusr1_2);
   uv_signal_start(&sigusr1_2, sigusr1_handler_2, SIGUSR1);
   
   uv_signal_init(&loop, &sigusr1_3);
   uv_signal_start(&sigusr1_3, sigusr1_handler_3, SIGUSR1);

   /* SIGUSR2 */
   uv_signal_init(&loop, &sigusr2_1);
   uv_signal_start(&sigusr2_1, sigusr2_handler_1, SIGUSR2);

   uv_signal_init(&loop, &sigusr2_2);
   uv_signal_start(&sigusr2_2, sigusr2_handler_2, SIGUSR2);

   uv_signal_init(&loop, &sigusr2_3);
   uv_signal_start(&sigusr2_3, sigusr2_handler_3, SIGUSR2);

   /* SIGINT */
   uv_signal_init(&loop, &sigint_1);
   uv_signal_start(&sigint_1, sigint_handler_1, SIGINT);

   uv_signal_init(&loop, &sigint_2);
   uv_signal_start(&sigint_2, sigint_handler_2, SIGINT);

   uv_signal_init(&loop, &sigint_3);
   uv_signal_start(&sigint_3, sigint_handler_3, SIGINT);

   uv_run(&loop, UV_RUN_DEFAULT);

   return 0;
}

In the case of the C example, the handlers are created so that the event loop keeps track of them. We did not explicitly call uv_unref . An interesting discovery for me was that the order in which the signal handlers are added to the tree, and then selected from it when a signal is received, depends not on the order in which the uv_signal_init and uv_signal_start functions were called, but on the order in which the variables representing the signal descriptors were initialized:

uv_signal_t sigusr1_1, sigusr1_2, sigusr1_3;
uv_signal_t sigusr2_1, sigusr2_2, sigusr2_3;
uv_signal_t sigint_1, sigint_2, sigint_3;

So, after installing signal handlers, the tree will look like this:

Technically, when performing tree operations, we will move from top to bottom. But in order to better understand what is happening in the end, I suggest mentally dividing the tree from left to right by boundaries that will separate signal handlers corresponding to the same number. And if you think like this, moving along the tree from left to right, it is immediately clear in what order the handlers will be called.

On the implementation of a red-black tree in libuv

In libuv, working with a red-black tree is organized through macros. The necessary structures are initialized through macros. Thus, unlike the heap and queue in libuv, the red-black tree knows exactly what is in its nodes. If we want to start storing some data in the red-black tree, we use macros, which are eventually expanded into structures and functions for a specific data type (in in your libuv for signals I have expanded these macros myself to clearly show what the woodworking functions actually turn into).

Signals and the Event Loop

The red-black tree for signals is not a field of the event loop, like the heap for timers, for example, but is stored in a global variable. And this tree stores signal descriptors from all event loops.

The signal can come at any time. Does that mean we have to drop everything and run the handlers?

First, let's consider when the system handlers installed via sigaction(2) will be executed. Typically, a signal is delivered to a process at its next scheduled execution, or immediately if the process is already running (for example, if the process sends a signal to itself). We can also block the delivery of signals using a signal mask. Incoming signals that are blocked will only be delivered when we unblock them. In libuv, the system signal handler is installed so that it cannot be interrupted by other signals, and all signals are blocked from being delivered while signal descriptors are being added to or removed from the tree. This way, newly arrived signals cannot interrupt the process of changing the structure of the signal descriptor tree.

Now let's see when those handlers that we installed as users are called. It is clear that only the system handler can initiate the call of these handlers. But they will eventually work in another place 🙂

To wait for I/O events in libuv for Linux, the epoll API is used, namely the epoll_wait(2) system call (to be precise, epoll_pwait(2), but this is not important for understanding the essence). With this call, we can track I/O events that occurred on file descriptors, and we can also block the process for a certain time (used for timers). And this call can also be interrupted by a signal. A situation may arise where a signal arrives before epoll_wait is called, and then the signal handler is triggered, and we are blocked in epoll_wait. It would be nice if signals that arrive to the process always lead to the process being unblocked when epoll_wait is called, in order to unify the approach to event handling.

This problem can be solved using an approach such as self-pipe trick. It consists of creating a channel (via pipe(2)), and making both ends non-blocking. We start monitoring the reading end of the channel via epoll, and when the signal handler is triggered, we write the necessary information to the channel. Thus, epoll_wait(2) will unblock when data appears in the channel. Then we can read the necessary data from the channel and perform the actions we wanted to perform when the signal was received. epoll_wait itself must be placed in a loop so that it is restarted in case of interruption.

In libuv, the structure for servicing the event loop contains a field that stores an array with two descriptors that correspond to the ends of the channel:

struct uv_loop_s {
   ...
   int signal_pipefd[2];
   ... 
};

When the system signal handler is triggered, the handlers installed by the user are selected from the tree in accordance with the signal number and sent to the channel. And when epoll_wait(2) sees that data has appeared in the channel and is unblocked, all signal descriptors are read from the channel and the handlers are triggered. Thus, we have figured out that the handlers installed by the user are not called at random, but in a controlled manner – after we exit epoll_wait(2).

Resume

To summarize, I can note the following points related to the operation of signals in node.js.

  • Signals are supported at the level of the environment in which javascript runs. In node.js, libuv is responsible for signal support.

  • Signal handlers that the user sets are stored in a red-black tree in a global variable.

  • We cannot know for sure when the system signal handler will fire. We can only be sure that it will not fire while the tree with signal descriptors is changing and while another handler is running.

  • The system signal handler initiates the call to the handlers that the user installed, and we know exactly when they will be triggered – after we exit epoll_wait(2).

  • To correctly track both I/O events and signals at the same time, an approach called self-pipe trick

Conclusions

From the end user's point of view, working with signals in node.js looks pretty simple – we set up handlers for a signal, and then, when this signal arrives, the handlers are called. They thought for us when exactly these handlers will be called, despite the fact that signals are mostly asynchronous in nature. And, as often happens, at a low level everything is more complicated and interesting. To store signal descriptors, the same red-black tree is used, which some people are afraid of 🙂 Here is another example of application. And to control the call of signal handlers, a generally accepted approach called self-pipe trick. It was very interesting to learn about all this! I will be very glad if someone learns something new or has an epiphany thanks to my article.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *