Why it is important that systems of linear equations are solved faster than matrices multiply

In 1998, when Google first appeared, its killer feature was a proprietary PageRank algorithm for sorting search results by popularity. Described by Stanford graduate students Brin and Page in scientific article, it boils down to a very simple idea:

PR (p_i) =  frac {1-d} {N} + d  sum_ {p_j  in M ​​(p_i)}  frac {PR (p_j)} {L (p_j)}

Where M (p_i) many inbound links to the page p_i, L (p_i) the number of outbound links on this page, and N the number of pages on the Internet. In this way PR (p_i) expresses the likelihood of being on the page p_i when randomly browsing the Internet, when with a probability d we go to the next page using a randomly selected outgoing link, and with a probability 1-d close the current page and open a randomly selected one.

Brin and Page quit Graduate School after Developing PageRank and Founding Google they were no longer interested in further mathematical research. However, calculating PageRank poses a non-trivial mathematical problem: we have a system of N linear equations with N unknown. In 1998 N in the millions, today in billions. How to solve a system of equations with tens of billions of unknowns?

We all learned at school the algorithm for eliminating unknowns one at a time: substitute the expression for PR (p_N) into all other equations, we obtain a system from N-1 equations with N-1 unknowns, and so on. This method of solving a system of equations is as simple as a cork, but it has a couple of problems:

First, substitution of one expression in one equation this is O (N) multiplications of rational numbers. Hence, the complete elimination of one unknown this is O (N ^ 2) multiplications, and the whole solution this is O (N ^ 3) multiplications. When N is calculated in billions, then O (N ^ 3) successive multiplications will take longer than our planet remains to exist before the Sun expands and engulfs it. About ten thousand times longer.

Fortunately, very few inbound links lead to every page on the Internet. not a billion or even a million. A popular article on Wikipedia can be referenced by tens of thousands of pages, this article on Habré well if tens. In other words, almost all coefficients in the system of equations of the Internet zeros. Every non-zero coefficient will undergo O (N) multiplications by substituting the equation containing it into all the others. If we denote the number of nonzero coefficients as m, then the whole solution will require O (N  cdot m) multiplications; at worst m = O (N ^ 2)but in the case of the internet m = o (N ^ 2), and for all multiplications, a dozen or two thousand years of computer time will be enough. If you build a distributed system of a million or two servers, like Google’s, then the system of Internet equations will be solved in a week. Before the spread of cryptocurrencies, it was safe to say that most of the planet’s computing resources go to solving systems of equations.

Second, the solution to the last equation will be the result of a chain of O (N  cdot m) multiplications and additions of rational numbers. We either need O (N  cdot m) bits to accurately represent this result (and each of the intermediate ones), or the result of each operation will have to be rounded to O (1) significant bits. Rounding errors accumulating in this way can distort the solution of the system of equations beyond recognition. Google may be content with imprecise PageRanks, but mathematicians need some guarantee of accuracy for example, that in the found values ​​of the unknowns there will be as many correct bits as in the coefficients of the system.

You can’t get more out of the school method for solving a system of equations. Freshmen are taught a more advanced matrix method: the PageRank system can be written as

 begin {bmatrix} 1 & -d  frac {M (p_2, p_1)} {L (p_2)} &  cdots & -d  frac {M (p_N, p_1)} {L (p_N)} \ - d  frac {M (p_1, p_2)} {L (p_1)} & 1 &  cdots & -d  frac {M (p_N, p_2)} {L (p_N)} \  vdots &  vdots &  ddots &  vdots \ -d  frac {M (p_1, p_N)} {L (p_1)} & -d  frac {M (p_2, p_N)} {L (p_2)} &  cdots & 1  end {bmatrix}  cdot  begin {bmatrix} PR (p_1) \ PR (p_2) \  vdots \ PR (p_N)  end {bmatrix} =  frac {1-d} {N}  begin {bmatrix } 1 \ 1 \  vdots \ 1  end {bmatrix}

or even more generally: A  cdot x = b, and then the vector of unknowns is expressed as x = A ^ {- 1}  cdot b… In other words, solving the system of equations is reduced to inverting the coefficient matrix and multiplying the inverse matrix by the vector of free terms.

By itself, the matrix method does not give, in comparison with the school method, either greater speed or greater accuracy: for inverting the matrices in the way that students are taught, all the same are required. O (N ^ 3) multiplications of coefficients. Outside the university program, however, there are more effective methods: since 2005, the probabilistic method has been known, and since 2019 deterministic algorithm for the exact solution of a system of equations per O ( log ^ 2 N) matrix multiplications N  times N, and the question of the optimal matrix multiplication algorithm remains open. Frontal method (“row by column”) requires O (N ^ 3) multiplications; Strassen’s algorithm (1969) O (N ^ {2.807}); Coppersmith-Winograd algorithm (1987) O (N ^ {2.376}); its improvements (latest in 2020) before O (N ^ {2.373})… Insofar as  log N = o (N ^ s) for anyone s> 0″ src=”https://habrastorage.org/getpro/habr/upload_files/d41/ca8/f74/d41ca8f74a853cdb222309e31e06890b.svg”>, then the factor <img data-lazyloaded= does not affect the final asymptotic complexity of solving the system of equations; it turns out to be equal to the asymptotic complexity of matrix multiplication. Matrix multiplication itself such a common practical problem that the search for an optimal algorithm for it was of interest to everyone, regardless of the solution of systems of equations: for example, an improvement in 2020 made it possible to reduce the degree of asymptotic complexity by less than 0.00001, and is still considered an important achievement. Equality of the complexity of solving a system of equations, i.e. matrix inversion, with matrix multiplication complexity was taken for granted: optimizing the solution to one of these problems seemed tantamount to optimizing the solution to the second.

For practical purposes, the Coppersmith algorithmGrape, and even more so its improved versions, are inapplicable: such algorithms are nicknamed “Galactic”, because the whole Earth is too small to store matrices of such sizes, on which these algorithms will give a speed gain over simpler ones. In practice, the most important is the special case when the multiplied or inverted matrices rarefied almost all of their elements are zero, as in the PageRank matrix. Algorithm for multiplication of sparse matrices, published in 2005, requires O (m ^ {0.7} N ^ {1.2} + N ^ {2 + o (1)}) operations, and overtakes CoppersmithGrapes at m = o (N ^ {1.6}); but for the inversion of sparse matrices, no more efficient algorithms were known than for arbitrary ones. This is partly due to the fact that the inverse of the sparse matrix will not itself be sparse, and in practice there will simply be nowhere to store it.

Recently, a note was published on Habré (with tales about horns and hooves, but without explaining the technical side of the issue) that both named “psychological barriers” were broken by Santos Vempala and Richard Peng by a couple of mathematicians from Georgia Institute of Technology, whose Work at the ACM-SIAM Symposium on Discrete Algorithms in January 2021 was recognized as the best of 637 submitted. For a system of equations with m = O (N) their algorithm finds the exact solution in O (N ^ {2.332}) operations, ahead of the degree of asymptotic complexity of matrix multiplication by 0.04. As we remember, the “school” method of eliminating unknowns one by one, provided m = O (N) comes to account for O (N ^ 2) operations, but does not guarantee the accuracy of the unknown values ​​found. At the heart of the new algorithm probabilistic approach: the values ​​of the unknowns are “guessed”, the magnitude of the error is estimated, and the “guessing” is repeated more accurately. Thus, the complexity estimate in O (N ^ {2.332}) refers to the “most likely” cases, not the worst. Another innovation in the algorithm the fact that at each iteration several random “guesses” are estimated at once, which allows to reduce the total number of iterations and thereby restrain the growth of the size of exact results.

You need to understand that the Vempala and Peng algorithm at one of the stages uses the Coppersmith algorithmGrape for matrix multiplication and therefore the new algorithm is even more “galactic”. The importance of the Vempala and Peng algorithm is not at all in practical applicability, but in the proof that the complexity of solving systems of equations is not limited by the complexity of matrix multiplication, which means that new effective algorithms for solving sparse systems of equations should be looked for in other directions. and they are guaranteed to be found. But neither the new algorithm nor other mathematical advances of the last half century have in any way influenced the practically applicable ways of solving systems of equations: Google and everyone else will continue to use the simple, fast and imprecise iterative method.

Our servers can be used to install any control panel.

Register using the link above or by clicking on the banner and get a 10% discount for the first month of renting a server of any configuration!

Similar Posts

Leave a Reply