Large Prime Numbers: Weight of Sequences

Look at this picture. It's called “Ulam tablecloth“. Pixels are numbered from the center in a spiral, and if the pixel number is a prime number, it is painted black. The diagonal lines immediately catch the eye. If you look closely, you can see horizontal and vertical lines. What is this? Are prime numbers suddenly subject to some kind of law? Or is the Universe trying to tell us something? say? Of course not. This is a clear illustration of how numerical sequences can have different weights.

The fact that there are formulas that generate an unusually large number of prime numbers was noticed by the great Russian mathematician of Swiss origin Leonard Euler. The polynomial he discovered x² – x + 41 takes prime values for all x smaller 41. And although with large x composite numbers begin to appear, but there are still unexpectedly many prime numbers. The diagonal, vertical and horizontal lines on the Ulam tablecloth are described by the formula 4x² + bx + cThe Euler polynomial is clearly visible on the tablecloth – it is a line starting slightly above the center and going up to the right.

What causes this effect? In short, it is the interaction of polynomials with small prime numbers. For each prime number, special formulas are known that are always divisible by this number. For example, such formulas give Fermat's little theorem And Euler's theorem. These formulas are very easy to apply to polynomials. Let's take the polynomial 2x² + 1it will be divisible by 3 for any xand on 11 only at x = 11n ± 4. And such divisibility rules exist for each polynomial, some have more of them, some have less. As a result, small prime numbers “sift” some polynomials well and do not touch others. These dependencies are not always obvious in algebraic form, but they are easy to see in practice. For example, on the Ulam tablecloth.

Sifting with small prime numbers has been well known since ancient times in the form of an algorithm called “Sieve of Eratosthenes“From the sequence of numbers from 2 to N all divisible by 2 are crossed out, then divisible by 3, then divisible by 5, and so on until some prime number is reached Pwhich is called the “sifting depth.” How many numbers are left in this sequence? Let's say, N much more P. Then we can estimate the number of remaining numbers S(N,P) using probabilities. The probability that a number is not divisible by 2 is 1/2. Not divisible by 3 = 2/3. Not divisible by 5 = 4/5. By multiplying these fractions, we get the probability of the number “surviving” when sifting. Interestingly, this probability can be calculated much more simply, using Mertens' third theorem:

$\frac{S(N,P)}{N} \approx \prod_{p\leq P}{\left( 1-\frac{1}{p} \right)} \approx \frac{1}{e ^γ\ln{P}},$

Where γ — Euler-Mascheroni constant. Thus, the result of the sieve of Eratosthenes can be predicted using the constant e^γfound a couple of thousand years after Eratosthenes.

But can we write a similar formula for polynomials? Bateman-Horn hypothesis and some other hypotheses claim that yes. For this we will need some characteristic of the sequence, which we will call “weight”. It shows how many more or fewer numbers remain after sifting in this sequence compared to a sequence of integers. For example, if we take even numbers, then obviously, after the first step of sifting, there will not be a single number left in this sequence. The weight of even numbers is zero. If we take odd numbers, then after sifting there will be twice as many of them, so their weight is two. The same principle works for polynomials and even for exponential sequences. This, of course, has not been proven and may never be proven, but it works great in practice. Sifting arbitrary sequences is described by the formula:

$\frac{S(f,N,P)}{N} \approx \frac{C_f}{e^γ \ln{P}}$

How to find out weight C_f sequences? For some it can be calculated exactly, but it is much easier to estimate it experimentally. Let's take N = 100,000, P = 1,000 and perform sifting. For example, let's take the Euler polynomial, after sifting there will be 54,110 numbers left. Thus, the weight of the Euler polynomial can be estimated as

$C_{x^2-x+41} \approx \frac{S(x^2-x+41,N,P) e^γ \ln{P}}{N} = \frac{54110 \cdot 1.78107. .. \cdot \ln{1000}}{100000} \approx 6.66$

It is not surprising that this line is so clearly visible on the tablecloth. It has 6.66 times more dots than a random line.

The last statement requires clarification. How is the weight and the number of primes in a sequence related? Prime Number Distribution Theorem tells us that the probability of primality depends only on the size of the number. Does it change for heavy sequences? No. The probability remains the same, but the number of candidates in heavy sequences is much larger. If we sum up the probabilities, the mathematical expectation of the number of primes in a heavy sequence will be several times greater than for random numbers of the same size. This can be used when generating prime numbers. In addition, the probability of primality is obviously affected by the sifting depth. With a sufficiently large sifting depth, the probability of primality can be equal to 1, as Eratosthenes intended. Therefore, when choosing a distributed computing project to search for large prime numbers, one should pay attention not to the weight of the sequences checked there, but to the sifting depth.

As mentioned earlier, exponential sequences behave the same as polynomial sequences in terms of weight and sieving. Of particular historical interest are sequences of the form k·2ⁿ + 1. Wacław Sierpinski proved that there is an infinite number of odd kin which the weight of the sequence is zero, that is, it will never contain a prime number. Such k are called “Sierpinski numbers” The smallest known Sierpinski number is 78,557. That it is minimal is easy to prove: just find one prime number in each sequence with k < 78,557 By the early 2000s, this was done for everyone k except for seventeen. Thanks to the efforts of the projects Seventeen or Bust (SoB) and PrimeGridat the moment there are only five left k: 21,181, 22,699, 24,737, 55,459 and 67,607. The following table shows the approximate weights of the corresponding sequences:

k	Weight
21 181	0.189
22 699	0,078
24 737	0,200
55 459	0.266
67 607	0.067

It is clear that two sequences have much lower weights than others. What does this mean in practical terms? That candidates with k 22,699 and 67,607 are much less than the others. Is this good or bad? On the one hand, fewer candidates means less work. But on the other hand, this leads to the fact that for these k the size of candidates grows very quickly. And with a doubling of the number size, the probability of finding a prime per unit of processor time drops by 8 times (the probability of primality drops by 2 times, the number of long multiplications increases by 2 times, the complexity of a long multiplication increases by >2 times). Thus, there is a high probability that the sequence 67,607 will go into the area of very large numbers, where one primality test will take weeks and months to complete on modern computing devices. Solving the problem of the minimum Sierpinski number in our lifetime largely depends on whether we are lucky with 67,607 and 22,699.

Speaking of luck. To solve the problem, we only look for the first prime in each sequence. We can calculate a characteristic such as “the expected number of primes in the sequence at the time of the first real prime.” In other words, how much work we spent relative to the theoretically necessary. If the work spent is less than 1, then we were lucky. If more than 1, then we were unlucky. The prime number distribution theorem tells us that the work on average is 1, which is confirmed by practice. If we take all k and sort them by the work expended, then we get an exponential distribution, which also corresponds to the theory.

This graph shows all of them k < 271 129, including the current position of sequences in which the search is still ongoing. The minimum Sierpinski number problem is marked as SoB (current n = 42,480,391), the Prime Sierpiński Problem is marked as PSP (current n = 33 140 576) and the Extended Sierpiński Problem is marked as ESP (current n = 27 249 801). These two additional tasks check all k up to 271,129.

As more candidates are screened, the latest k shift to the right. You can see that there is a group of “unlucky” k (position among all k > 95%), who have just been unlucky enough to find a prime number. But there are many candidates in these sequences, so there is confidence that sooner or later the search will be successful. However, there is a group of “slow” k (<90%), which are not that unlucky, they just have low weight and few candidates. But if it happens that 67,607 in addition to being "slow" also turns out to be "unlucky", then the amount of resources needed to solve the problem can reach a galactic scale in Kardashev scale.

The last success so far was with k = 202 705. Number 202 705 · 2^{21 320 516} + 1 It turned out to be simple. It k is the unluckiest of the entire list. When the first prime in the sequence was found, there were already 10.1 primes expected. The weight of this sequence is 0.502, and the number of candidates was very large. In the end, it worked.

It's scary to think what will happen if 67,607 also decides to hold on until 10.1 (n = 10⁴⁷).