Random number generators in different OS


“The generation of random numbers is too important to be left to chance” – Robert R. Cavue

One late summer evening I had to figure out how random number generators work in Windows and Linux. Actually, in this article I will try to bring the accumulated information, and present it in the simplest possible terms, without having to go into source codes, tutorials and articles.

Pseudo-random number generators

With the advancement of technology and security, we increasingly need truly random numbers that could not be predicted from the outside. Why? First of all, because of encryption, because every year the amount of traffic sent is growing, and at the same time we want to have a sufficient degree of security of our data. And at the moment when it is necessary to generate a random number, our computers have problems, because they are designed to be as obedient, predictable and deterministic as possible so that all results with the same input data are reproduced, otherwise the whole world would fall apart.

A way out of this situation was found and quite elegant – let’s take a “predictable” pseudo-random number generator (PRNG) and initialize it with random bits (we don’t think about where and how it was taken yet). From a small number of truly random events, we can get fairly good random numbers in large numbers. And while the “attacker” does not have access to the internal state of our computer, we can rely on the generated sequences of such numbers. Or you can control the number of bits issued and not allow more entropy to be given than there is in the initialization state, then it will be impossible to predict the next bit.

A little about entropy

In order not to overload the text, I decided not to give such definitions as information entropy and the amount of information. They are already familiar to many. Therefore, I calmly allow myself to use phrases such as “more entropy”, “entropy pool”, “entropy source”. And for those who have encountered these terms for the first time, – wiki or to perceive entropy intuitively, as a measure of randomness (or randomness itself, for example, “the source of entropy”), and the larger it is, the more “random” the resulting bits are.

A pseudo-random number generator is a function that shuffles the input bits, applying simple operations to them several times, outputs the result, which is a sequence of random bits. For example, consider the ChaCha20 algorithm used in Linux and the SP800-90 AES-CTR-DRBG in Windows.

  • ChaCha20 is an evolution of the Salsa20 algorithm. It is based on a combination of operations: 32-bit addition, XOR and bitwise rotation. In short: the 4 * 4 matrix is ​​filled with a special constant, key, counter and nonce for the current iteration, and then the bits are shuffled for 20 rounds of the algorithm. In this case, odd rounds are responsible for changing bits along the columns of the matrix, and even rounds are responsible for changes along the diagonals. The bits received at the output are the pseudo-random number, which is also the input state for the next start of the generator. But, since with this approach it would be very easy to predict all the following numbers, there is a mandatory operation of changing the key after each request;

Shuffle code in ChaCha
// linux/lib/crypto/chacha.c
for (i = 0; i < nrounds; i += 2) {
// Odd round
    x[0]  += x[4];    x[12] = rol32(x[12] ^ x[0],  16);
    x[1]  += x[5];    x[13] = rol32(x[13] ^ x[1],  16);
    x[2]  += x[6];    x[14] = rol32(x[14] ^ x[2],  16);
    x[3]  += x[7];    x[15] = rol32(x[15] ^ x[3],  16);

    x[8]  += x[12];   x[4]  = rol32(x[4]  ^ x[8],  12);
    x[9]  += x[13];   x[5]  = rol32(x[5]  ^ x[9],  12);
    x[10] += x[14];   x[6]  = rol32(x[6]  ^ x[10], 12);
    x[11] += x[15];   x[7]  = rol32(x[7]  ^ x[11], 12);

    x[0]  += x[4];    x[12] = rol32(x[12] ^ x[0],   8);
    x[1]  += x[5];    x[13] = rol32(x[13] ^ x[1],   8);
    x[2]  += x[6];    x[14] = rol32(x[14] ^ x[2],   8);
    x[3]  += x[7];    x[15] = rol32(x[15] ^ x[3],   8);

    x[8]  += x[12];   x[4]  = rol32(x[4]  ^ x[8],   7);
    x[9]  += x[13];   x[5]  = rol32(x[5]  ^ x[9],   7);
    x[10] += x[14];   x[6]  = rol32(x[6]  ^ x[10],  7);
    x[11] += x[15];   x[7]  = rol32(x[7]  ^ x[11],  7);
// Even round
    x[0]  += x[5];    x[15] = rol32(x[15] ^ x[0],  16);
    x[1]  += x[6];    x[12] = rol32(x[12] ^ x[1],  16);
    x[2]  += x[7];    x[13] = rol32(x[13] ^ x[2],  16);
    x[3]  += x[4];    x[14] = rol32(x[14] ^ x[3],  16);

    x[10] += x[15];   x[5]  = rol32(x[5]  ^ x[10], 12);
    x[11] += x[12];   x[6]  = rol32(x[6]  ^ x[11], 12);
    x[8]  += x[13];   x[7]  = rol32(x[7]  ^ x[8],  12);
    x[9]  += x[14];   x[4]  = rol32(x[4]  ^ x[9],  12);

    x[0]  += x[5];    x[15] = rol32(x[15] ^ x[0],   8);
    x[1]  += x[6];    x[12] = rol32(x[12] ^ x[1],   8);
    x[2]  += x[7];    x[13] = rol32(x[13] ^ x[2],   8);
    x[3]  += x[4];    x[14] = rol32(x[14] ^ x[3],   8);

    x[10] += x[15];   x[5]  = rol32(x[5]  ^ x[10],  7);
    x[11] += x[12];   x[6]  = rol32(x[6]  ^ x[11],  7);
    x[8]  += x[13];   x[7]  = rol32(x[7]  ^ x[8],   7);
    x[9]  += x[14];   x[4]  = rol32(x[4]  ^ x[9],   7);
}
Filling the round ChaCha array
Filling the round ChaCha array
  • SP800-90 AES CTR DRBG (CounTeR mode Deterministic Random Byte Generator) is a cryptographically strong pseudo-random number generator based on AES (Advanced Encryption Standard) block encryption in counter mode. The current state of the generator is described by three objects: a key K, a vector V, which is a counter, and a refill counter counter. In a simplified form, the process of creating an output sequence can be divided into 2 parts:

    • Creation of the key K and the initial vector V using the update algorithm and additional input data;

    • The creation of the necessary pseudo-random numbers occurs by encrypting the counter, while the counter is incremented after each step. The encryption itself takes place using the AES algorithm with a generated key. Thus, this algorithm eliminates the need for additional mixing of the initial state after each call, as it was in ChaCha.

Update algorithm
Update algorithm
Generating output
Generating output

Random events

It remains to figure out the most interesting thing – how to initialize the PRNG in order to get really random numbers? In addition, the problem is not only in the initialization, but also in the constant reinitialization, which is necessary to prevent the possibility of predicting the next state. This is where the most interesting and important job of finding randomness in events in the system begins.

Linux

Basic rules for choosing events that are considered random. First, these events must be non-deterministic. Secondly, they must be difficult to observe from the outside. These random events are added to the entropy pool (just an array of numbers), mixed with its contents using a special CRC-like hash function. It runs quickly so that it can be applied after every event of interest in the system, and is good enough if we assume that random events do not fill the pool in a malicious manner. In this case, when an event is added to the pool, the amount of entropy that has arrived is taken into account.

Currently, 4 types of sources of random events are used:

  • Information from devices, which should be different on physically different machines, for example, the MAC address of a network card. In fact, this does not add entropy to the system, but it allows in very bad cases (starting from the same image) on different devices to get different states;

  • Information from timer, interrupt, interrupt type, value;

  • Interruption time;

  • Information about the block search time on the disk. However, on modern SSDs, this is a rather bad source of randomness, since their seek times are relatively short and about the same always.

To initialize or reinitialize the PRNG, you need to get several random bytes from the entropy pool. To do this, the entire pool is hashed with the SHA-1 algorithm, and the hash sum is issued as a random set of bits. At the same time, measures are taken to ensure the safety of the generator in the future. First, the hash result is mixed with the pool so that the current state cannot be restored from the output value. Second, there is a constant assessment of the remaining amount of entropy in the pool.

Because of the latter, there are 2 ways to interact with random numbers in Linux – / dev / random and / dev / urandom. The first is blocked when the estimate for the amount of entropy falls below zero, and the second always outputs numbers, even if the pool is not replenished with random bits. However, the numbers can still be random enough for the required task.

It should be added that at many steps, where it makes sense, the code adds calls to “iron” random number generators, which work faster and give better entropy. It might not be that important, but Intel added RDRAND and RDSEED instructions to Ivy Bridge, and later AMD did it. Thus, many modern computers have a fast random number generator in the CPU. Why this is necessary will be explained in the conclusion.

Windows

In Windows, the process of creating prime numbers is subject to a rather complex tree structure. There are three types of entropy sources, which differ in purpose and quality. The entropy obtained from them is used to initialize and reinitialize the root PRNG – all random numbers are obtained from it in one way or another. Since in modern multicore computers it would be prohibitively slow to have only one generator, then for each logical CPU, its own PRNG is created, which is initialized by the root one. Further, for each user process, its own PRNG and its child generators for each logical CPU are started and initialized. This way we get something like a generator tree.

All generators, except for the root one, are reinitialized when they realize that their state is out of date. It does this by maintaining and comparing eras. The counter is incremented every time the root generator is reinitialized. Moreover, each of its “descendants” in the tree locally remembers the state of the counter during its own re-initialization. The root generator fills up on a schedule – when the system boots, and then with an exponentially growing period: 1, 3, 9, 27 seconds, etc. The maximum value for the period is 1 hour.

The sources of entropy in Windows are:

  • Interrupt time (main source) – for each interrupt, a time stamp is taken by reading from the TSC (English: TimeStamp Counter) and written to a special 256-byte array in a compact manner;

  • TPM (English: Trusted Platform Module) – gives out 40 bytes at the start and 64 bytes at each reinitialization, but due to limitations, it cannot do this more often than once every 40 minutes;

  • RDRAND / RDSEED – “iron” generators provided by the processor;

  • Seed file – an entry in the registry that is created by the OS during operation and used during the next boot;

  • External entropy is a registry entry that is made by the installer for the first start of the system, but can also be used by the user in the future to influence the initialization process;

  • ACPI-OEM0 – Created by the Hyper-V hypervisor and populates each time the guest OS starts up;

  • Data from drivers is hashed and is presented as a very bad source of entropy, which, however, allows you to initialize the system differently on different physical machines;

  • UEFI – random numbers from the UEFI driver;

  • The timestamp of the system start is not a very good source, but it reduces the likelihood that, starting from the same system image, the machines will receive the same states;

  • Unique (not random) number from Hyper-V – helps to fight against recurring state when starting a system snapshot.

Hyper-V

Typically, Hyper-V is not the only one to provide such improvements when working with Windows. Many hypervisors pretend to be Hyper-V to provide the same functionality and take advantage of built-in Windows performance enhancements.

During system startup, data from 7 sources (Seed file, external entropy, TPM, RDRAND, ACPI-OEM0, UEFI and start time) is hashed with SHA-512 and used to initialize SP800-90 AES-CTR-DRBG. Already while the system is running, the data provided by the source is placed in the pool (except for the first time, when they go immediately to reinitialize the root PRNG).

Conclusion

As you can see, many sources of random events are related to the current state of the machine, therefore, problems can begin during virtualization. On Linux, code comments sometimes openly admit this problem. In Windows with Hyper-V (or another hypervisor that “pretends” to be it), they try to deal with this, but the problem itself sometimes appears. The situation is somewhat facilitated by the fact that modern processors have “iron” random number generators, and there are also virtualized generators that slip random numbers from the host OS to the guest. After all, you can’t leave it to chance …

Literature

Linux
Windows
ChaCha20
SP800-90 AES-CTR-DRBG

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *