What is the danger of pure RSA? We analyze the pitfalls

Introduction

The RSA algorithm is one of the first representatives of asymmetric cryptography, which has made its way through heated discussions of mathematicians, through hundreds of successful and unsuccessful attempts to break it by cryptanalysts, through thousands of incorrect applications and implementations.

Currently, the RSA algorithm is gradually and systematically being replaced by asymmetric cryptography on elliptic curves due to the preservation of the old level of security, but with a shorter key length. However, it still occupies a leading position in its application and quite justifiably, given its relatively long history for the section of asymmetric cryptography.

On Habr, the RSA algorithm has been discussed and criticized many times, such as here, here, here, but all the analysis was reduced either to the analysis of a specific flaw/vulnerability, or to the incorrect use/application of the old standard. All this is not bad, but does not give a complete picture of what dangers it can still pose. pure mathematical structure RSA algorithm.

RSA in a nutshell

Key generation:

  1. Generate a pair of large prime numbers p,q, Where p ≠ q,

  2. Create a number n = pq,

  3. Create a number eWhere GCD(e, φ(n)) = 1, φ(n) = φ(pq) = φ(p)φ(q) = (p-1)(q-1),

  4. Find the number dWhere d ≡ e-1 (mod f(n)).

Numbers {e,n} – public key, numbers {d, n} – private key. Numbers e And d – open and closed exponents, respectively, are reciprocals of each other: d = e-1 => ed ≡ 1 (mod f(n)). f(n) – Euler function – the number of relatively prime numbers to a number n from 1 to n-1 inclusive. GCD is the greatest common divisor if GCD(a, b) = 1then it is considered that a, b – relatively prime numbers (there are no common divisors except one).

Encryption/Decryption:

  1. Encryption: c ≡ me (mod n),

  2. Decoding: m ≡ cd (mod n).

The correctness of the decryption is based mainly on Euler's theorem, which states:

Euler's theorem

If GCD(a, n) = 1That af(n) ≡ 1 (mod n)

Therefore, if GCD(m, n) = 1Then m1+ф(n) ≡ m1 ≡ m (mod n)During encryption and decryption the following occurs: cd ≡ (me)d ≡ med ≡ med+ф(n) ≡ m1+ф(n) ≡ m (mod n)There are two special cases that do not fall under the conditions of Euler's theorem, namely when GCD(m, n) = p ≠ 1 or GCD(m, n) = q ≠ 1. In that case, m ≡ 0 (mod p) or m ≡ 0 (mod q), and as a consequence, med Always ≡ m (mod p) or ≡ m (mod q). As a result, From the Chinese remainder theorem it follows that med Always ≡ m (mod n).

Chinese Remainder Theorem

Let n1n2…, nk — natural pairwise coprime numbers, and r1r2…, rk – some integers, then there exists such an integer Mthat it will be a solution to the system of comparisons:

M ≡ r1 (mod n1)
M ≡ r2 (mod n2)

M ≡ rk (mod nk)

Moreover, for any two solutions A And B this system is fair A ≡ B (mod n1n2…nk), that is, the solution of the system of comparisons exists and is unique in absolute value n1n2…nk.

RSA's security is primarily based on the factorization problem, which makes it difficult to compute f(n) without knowing the numbers p,q. Currently, all search methods f(n) are based on an algorithm for exhaustive search over a large set of numbers, but the fact that the factorization problem itself is complex has not been proven. We can only be guided by the history of this problem and attempts to solve it.

Maybe in the future an algorithm for fast finding will be found f(n)or it will be proven P = NP (which will lead to a proof of the non-existence of one-way functions), or a quantum computer with a large number of stable qubits will be created (on which further application of Shor's algorithm will be possible).

Attacks on pure RSA

The pure RSA algorithm is vulnerable to a large number of hacking methods. Let's try to analyze some of them. I will try to accompany most examples with program codes.

Brute-force attack against plaintexts

Each message in the RSA algorithm is represented by a number mnot exceeding the number n. Number n must always be large enough, which is due to the necessity of such a choice of numbers p And q. However, the number m does not have such a limitation, as a result, it can have any size in the range 0 ⩽ m < n depending on the application.

If the applied logic of using RSA is reduced, for example, to encrypting issued salaries x: c = Ek(x)then even with very large sums, enumeration will be an easy task – it is enough to just enumerate the numbers from 0 to max(salary)encrypt them and compare the results.

// ms = max_salary
// ds = dec_salary
bool decrypt(pubkey_t pk, ciphertext_t c, uint64_t ms, uint64_t *ds) {
  ciphertext_t сi;
  for (uint64_t i = 0; i < ms; ++i) {
    encrypt(&ci, pk, i);
    if (equal(c, ci)) {
      *ds = i;
      return true;
    }
  }
  return false;
}

The problem itself is based on the determinism of the encryption function E: under the same input conditions (e, n, m) will produce the same ciphertext c. This attack can be prevented by encoding the plaintext before encryption: c = E(C(m)). The encoding must have a non-deterministic property, introducing random bytes into the text, so that (c1 = C(m)) ≠ (c2 = C(m)) when used multiple times. In the same turn, there must be such a decoding function that would unambiguously return the original plaintext without distortion: D(c1) = D(c2) = mIn the modern world, the OAEP (Optimal Asymmetric Encryption with Padding) encryption scheme is used for this purpose.

Logarithm attack

Although the basic problem on which RSA is based is indeed the factorization problem, it is not the only one. When RSA performs the decryption function, the discrete logarithm problem comes into play.

The discrete logarithm problem comes down to finding a number x from the available number y ≡ gx (mod n). The difference between the usual logarithm problem and the discrete one comes down to the existence of a module nthrough which the number y = gx “is cut off”. Thus, in the logarithm problem y – this is the result of raising to a power in the discrete logarithm problem y – this is the remainder from dividing the result of raising to a power.

Just as with the factorization problem, we do not yet know of any efficient and non-quantum algorithms for solving the discrete logarithm problem. Nor is it known whether they exist at all: there is no proof of their absence or existence.

The logarithm attack on the RSA algorithm comes down to a mistake in choosing small public exponents, such as e = 3at which the situation begins to arise: me < n. This, in turn, leads to the possibility of easy recovery of the private key, as a way of bypassing the discrete logarithm problem: gx (mod n) = gxand bypassing the factorization problem: 1/e = d ≡ e-1 (mod f(n)). Thus, if there is a ciphertext cwhich was obtained by using a small open exponent e: c ≡ me (mod n)then there is a high probability that the ciphertext can be decrypted as follows: m = c1/ebypassing the module n.

uint64_t p = 17, q = 101;
uint64_t n = p*q; // 1717
uint64_t fn = (p-1)*(q-1); // 1600
uint64_t e = 3;
assert(gcd(e, fn) == 1); // НОД(e, ф(n)) = 1 => существует d
    
uint64_t d = invmod(e, fn); // d = e^(-1) (mod ф(n)) = 3^(-1) (mod 1600) = 1067
uint64_t m = 4;
uint64_t c = expmod(m, e, n); // c = m^e = 4^3 = 64
uint64_t r1 = round(pow(c, 1./e)); // c^(1/e) = 64^(1/3) = 4
uint64_t r2 = expmod(c, d, n); // c^d mod n = 64^1067 mod 1717 = 4
assert(r1 == r2);
Helper functions
// Алгоритм Евклида
uint64_t gcd(uint64_t a, uint64_t n) {
    uint64_t t;
    while(n != 0) {
      t = a;
      a = n;
      n = t%n;
    }
    return a; 
}

// Расширенный алгоритм Евклида
uint64_t invmod(uint64_t a, uint64_t n) {
    uint64_t tx = 0, x0 = 1, x1 = 0;
    uint64_t q = 0, r = 0;
    uint64_t tn = n;
    while (n != 0) {
        q = a / n, r = a % n;
        tx = x0 - q * x1;
        a = n, n = r;
        x0 = x1, x1 = tx;
    }
    return (x0 + tn) % tn;
}

// Быстрое возведение в степень по модулю
uint64_t expmod(uint64_t a, uint64_t b, uint64_t n) {
	uint64_t result = 1;
	for (uint64_t bit = 1; bit <= b; bit *= 2) {
		if (bit & b)
			result = (result * a) % n;
		a = (a * a) % n;
	}
	return result;
}

To avoid such an attack, it is necessary to choose a moderately large open exponent. e. The most commonly used open exponent is of the following form: e = 22^4+1 = 216+1 = 65537. This number is the fifth Fermat number (report from scratch) and was not chosen by chance. The point is that the binary representation of all Fermat numbers contains only two units: 1000000…0000001, which allows for fewer multiplication operations due to the use of the fast exponentiation algorithm. But this property will only be secure when encrypting or checking signatures. When decrypting or signing is used, i.e. when a private key is used, the use of fast exponentiation must always be carried out with two multiplication operations, regardless of the zeros and ones in the binary representation of the key. Otherwise, timing attacks may spread to the private key.

Single Key Attack

The RSA algorithm is capable of not only encrypting information, but also signing it. With all this, the function of signing information S is equivalent to the function of its decryption Dand the signature verification function V equivalent to the encryption function E. In other words, if there is a text xThat: S(x) = D(x) = xd mod n, V(x) = E(x) = xe mod n.

Due to this feature, there may be incorrect use of the RSA algorithm with a single key, when in the application it will be necessary to use both encryption and signing functions.

Let's assume that there is a service that is able to sign information sent to it with a private key. k-1. At the same time, secret communication with this same service is carried out thanks to its public key. k. In this case, to find out the contents of the sent secret text c = Ek(m)an attacker can simply ask the service itself to sign the ciphertext: Sk-1(c) = Sk-1(Ek(m)) = Dk-1(Ek(m)) = m.

But it can also be assumed that the service takes into account previously received ciphertexts and does not process them twice. In this case, the attacker can act more cunningly – send not cA c' ≡ rkc (mod n)Where r – random generated number with condition GCD(r, n) = 1In this scenario, the following will happen: Sk-1(c') = Sk-1(rkc) = Sk-1(rkEk(m)) = Dk-1(rkEk(m)) = Dk-1(rk)Dk-1(Ek(m))) = rmAll that remains for the attacker is to multiply the obtained result by the inverse of the random number: r-1r m ≡ m (mod n).

To avoid this problem, either different encryption keys or different coding schemes must be used. For example, in the real world, the OAEP scheme is used for encryption, and PSS for signing.

Meet-in-the-middle attack

The RSA algorithm has a multiplicative property that allows multiplication to be performed on encrypted data. In other words, the RSA cryptosystem is partially homomorphic.

Ek(m1)Ek(m2) = Ek(m1m2) <=> m1em2e ≡ (m1m2)e ≡ c (mod n)

Such an interesting property, in addition to possible further practical application, also brings a number of additional troubles, namely: what if, the number m will be composite? Such a condition is more than realistic, since the probability that the number m will turn out to be composite higher than if it turned out to be simple.

The fundamental theorem of arithmetic

Any natural number n > 1 can be factorized into prime factors, i.e. written in the form n = p1d1p2d2…pkdkWhere pi – prime factors, di – natural numbers.

Let's assume that the composite number m represented by two prime factors p1 And p2In this case, the RSA algorithm will exhibit the multiplicative property: Ek(m) = Ek(p1p2) = Ek(p1)Ek(p2) <=> p1kp2k ≡ c (mod n). Let us further assume that by the number m = p1p2 is understood as a random 64-bit symmetric encryption key. In this case, to find the correct encryption key, an attacker would need to perform not 264 encryption operations, but only 233This becomes possible due to the comparison search on both sides of the calculations: p1kp2k(xk)-1 ≡ yk (mod n) For x ⩽ 232The result of decryption will be numbers (x = p1y = p2) or (x = p2y = p1) => p1p2.

import math

p = 129765116189934447386218567135214456447225548887413779366409285893621919735749
q = 52464445694847850526312013404560730738901192255658078180137041242614031905553
n = p*q # n = 512 bit
fn = (p-1)*(q-1)
e = (1<<16)+1 # e = 2^16+1 = 65547 
assert math.gcd(e, fn) == 1

def gcd_extended(a, b):
    if a == 0 :
        return b, 0, 1
    gcd,x1,y1 = gcd_extended(b%a, a)
    x = y1 - (b//a) * x1
    y = x1
    return gcd, x, y

def decrypt(enck: int, kbits: int):
    full_maxv = 1 << kbits
    half_maxv = 1 << (kbits//2)
    is_even = enck%2 == 0 
    mapi = {}
    for y in range(2, half_maxv):
        if is_even and y%2==0: continue 
        c_exp_y = pow(y, e, n)
        mapi[c_exp_y] = y
    for x in range(2, full_maxv):
        if is_even and x%2==0: continue 
        _, inv, _ = gcd_extended(pow(x, e, n), n)
        c_inv_exp_x = (enck*inv) % n
        if c_inv_exp_x in mapi:
            return x*mapi[c_inv_exp_x]
    raise "decrypt failed"

skey = 0xB3AF1FDF1F # 40-bit secret key = 771737247519
enck = pow(skey, e, n)
print(decrypt(enck, 40))
Running the code and some reasoning

With poorly optimized Python code and a not very powerful laptop, finding a random 40-bit key encrypted with a 512-bit RSA key took me about three minutes:

$ time python3 main.py                                                                                              INT ✘  46s  
> 771737247519
> python3 main.py  179,87s user 17,59s system 99% cpu 3:19,26 total

If we compare the complexity of this 40-bit search with reality, then as an example, at the beginning of 2018, the complexity of generating one block in the Bitcoin network was 41-bit in 10 minutes. The difference in complexity between 40-bit and 41-bit = two times, therefore, my laptop could generate a block in 3 minutes (if it hypothetically contained a similar vulnerability), while the whole network would generate the same block only after 5 minutes.

To protect yourself from such vulnerability, it is necessary to remove the multiplicative property. This is also possible through the use of coding schemes such as OAEP and PSS.

Conclusion

In our small analysis, we were able to look at a number of vulnerabilities in pure RSA and see that each flaw we looked at would actually be detrimental if additional encoding schemes like OAEP and PSS did not exist, or if relatively large public exponents were not used. e.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *