Walsh-Hadamard transformation
There is a great one on hackerrank.com task.
By given array short[] A;
find the maximum number of its subarrays, xor
elements of which will be the same. This one himself xor
also needs to be found.
The maximum array length is 105, so the quadratic algorithm does not fit into the execution time limit. At one time I couldn’t cope with this task and gave up, deciding to look at the author’s solution. And at that moment I understood why I couldn’t cope – the author proposed solving the problem through a discrete Fourier transform.
Detailed problem statement
First, you should describe the task in detail to be sure that everyone understands it in the same way. Array given A
integers, each in the range 1 to 65535 (bounds included). Let's say this is an array . Let's look at all its subarrays:
2 subarrays have xor
equal to 1, also 2 other subarrays have xor
equal to 3.
2 – maximum number of subarrays with the same xor
-ohm. Himself xor
we choose equal to 1 as the minimum value from all alternatives (1 or 3). Total solution to the problem – pair .
You just need to learn how to do the same for arrays up to 100000 in size. We will denote the length of the array as N
.
Naive solution
A naive solution involves looping twice over the auxiliary array B
constructed as follows:
B[0] = 0
B[i] = B[i-1] ^ A[i-1]
Array B
has length N+1
. With this construction, each B[i]
(at i > 0
) will be equal xor
-for all elements A[k]
For 0 <= k < i
. This property is very convenient, because in this case xor
subarray [A[i], ..., A[j]]
will be equal B[j+1] ^ B[i]
, i.e. calculate in constant time. The array itself is built in linear time, also quickly.
All that remains is to count xor
all subarrays. The pseudocode would look like this (65536 is 216):
int[] frequencies = new int[65536];
for (int i = 0; i < N; i++) {
for (int j = i; j < N; j++) {
++frequencies[B[j + 1] ^ B[i]];
}
}
int maxFrequency = max(frequencies);
int xor = indexOf(frequencies, maxFrequency);
We avoided the triple cycle, but the double cycle remained, and this is bad. it will contain iterations, which in the worst case is approximately 5*109.
Considering the frequency at which processors usually operate, we can conclude that this loop will run for several seconds. In practice, the solution does not fit into the time frame configured on the site (4 seconds for Java, for example).
What solution does the author propose?
Wrong.
That is, as incorrect, the following code in Editorial the correct one is given, but the text description from the screenshot is complete nonsense and does not coincide with the code.
Although I also have questions about the code, we’ll skip this part and just say that it’s there strange.
I will explain in my own words what was actually meant, and not what is written there. And I won’t do it in 3 lines of text. The idea of the final algorithm is this:
long[] C = new long[65536];
for (int i = 0; i <= N; i++) {
++C[B[i]];
}
convolute(C);
C[0] -= N + 1;
long maxFrequency = max(C) / 2;
long xor = indexOf(C, maxFrequency * 2);
The solution consists of 2 parts: convolution of the auxiliary array C
(not an array B
as stated) and the expression C[0] -= N + 1
. By the time of the call max(C)
array C
must contain exactly the same data as the array frequencies
in the naive solution, but multiplied by 2.
Let's get closer to understanding this code by thinking about how to use the original array C
In general, you can get an answer. This will be the key to the solution.
In the naive solution we did ++frequencies[k]
for all k == B[i] ^ B[j]
,
Where 0 <= i < j <= N
(here I replaced the variable j + 1 -> j
so as not to carry around +1
so there is less confusion).
By using ++
in the end we counted the number of pairs (i, j)
for which B[i] ^ B[j]
same and equal k
. In other words, for everyone i < j
And k == B[i] ^ B[j]
we have to sum the units to get frequencies[k]
.
Let's consider the same example that was shown in the detailed formulation of the problem.A = [1, 2, 3]
Further in the indices at
B = [0, 1, 3, 0] ([0, 0^1, 0^1^2, 0^1^2^3])
C = [2, 1, 0, 2, ...]1
their corresponding pairs are written (i, j)
:
It seems to agree with the values that we saw earlier.
Let's start transforming this formula until we get something worthwhile. First of all, you should get rid of i < j
: such complex indexing in total makes the formula very difficult. Let's consider an alternative meaning:
It's in order i
And j
is ignored, and their complete equality is also allowed. This can be shown explicitly:
This means that it is enough for us to calculate the simpler .
What if some B[i]
match up? This seems to be a fair assumption in general. If we knew in advance that B[i_1] == B[i_2] == ... == B[i_n]
then counting the number of all B[i_k] ^ B[j]
would be trivial, there would be exactly such pairs n
things.
The calculation procedure can be modified if done as follows:
for all i
such that everything B[i]
pairwise different and k == B[i] ^ B[j]
we must sum count(B[i])
To obtain. It’s similar to what happened before, but it’s not the units that add up, and i
not all are taken into account.
Phrase bi is unique not entirely correct, but I hope it gives an intuitive understanding of what is happening. What if we carry out the same grouping by indexes? j
? We get the following:
for all i
such that everything B[i]
pairwise different, all j
such that everything B[j]
pairwise different and k == B[i] ^ B[j]
we must sum count(B[i]) * count(B[j])
To obtain .
If you look closely at the solution code, you will see that the array C
it just calculates count
i.e. C[B[i]] == count(B[i])
. Further, i
And j
It will be possible to throw it out of the discussion altogether by making a little trick.
If some value x
not found in the array B
That C[x] == 0
. If any value x
found in B
then it occurs exactly C[x]
once. This is a good property. Taking this into account, I propose to see what happens when calculating such a sum:
which can be rewritten a little differently by getting rid of the letter q
:
If p
or q
not in the array B
then the corresponding value count
will be equal to zero and the term will be ignored. Otherwise p
And q
will correspond to unique values in B
. In other words, it's literally the same amount.
We can write the following code which should be equivalent to the line convolute(C)
:
long[] tmp = new long[65536];
for (int p = 0; p < 65536; p++) {
for (int q = 0; q < 65536; q++) {
tmp[p ^ q] += C[p] * C[q];
}
}
C = tmp;
It is this sum of products that is called the convolution of the array C
With myself.
It also became clear where they came from C[0] -= N + 1
and subsequent division by 2, from the relationship betweenAnd.
The convolution will be calculated using the fast Walsh-Hadamard transform (FWHT), which works in O(M*log(M))
(we have M = 65536
) and one more magical action, about which a little later.
Most of the article will be devoted to explanations of what kind of transformation this is, what convolution is in a more general case, how they are related, and what discrete Fourier transforms are in general.
Discrete Fourier transform
Any discrete Fourier transform (DFT) is quite simple. The input is a vector lengththe output is a vectorlength, in the middle is matrix-vector multiplication. Or vectors onto a matrix, it doesn’t matter. I'll start with the classic discrete Fourier transform and move on to the Walsh-Hadamard transform (WHT) later.
For the most common vectorsdimensionsmatrix is used
Where – rootth power of 1. This is literally the complete definition of the discrete Fourier transform. In any case, it is equivalent to all the others.
There is also an inverse transformation, with the help of which from the resulting vectoryou can get the original vector backit is expressed by the inverse matrix
wherein is a normalizing coefficient.
The point here is that the determinant of the matrixequals.
In general, the matrix elements look like this:
Coordinate system transformation
It is often written about discrete Fourier transforms that they decompose the signal into harmonics, or something like that. All these interpretations do not interest me within the framework of this article. A purely algebraically discrete Fourier transform is simply a translation of a vector from one basis to another (a special case of an affine transform). It’s just that the basis is selected in such a way that the translation into it begins to have some magical properties.
Now there will be a short digression that explains how this translation works in practice.
For DFT the basis is a system of vectors :
– these are just coefficients in the vector expansionby vectors. Moreover, what is written here is literally the inverse DFT formula in a more expanded form. Now we'll see why, calculated through direct DFT, satisfy this equality. To do this, first of all, it is worth noting that
Top line above is a complex conjugation symbol that turns V . This conjugation can be rewritten differently:
This expression already looks suspiciously like a matrix rowexcept onneed to multiply. And in general, the matrix– unitaryadjusted for the normalizing coefficient.
If the matrix is normalized, then it will become truly unitary. Let's write down the direct transformation:
Hereis the scalar product of two complex vectors, equal to .
One of the factors in each term must be replaced by a complex conjugate, the definition is as follows. If you put everything together, you get the following:
And this is simply the decomposition of a vector along an orthogonal basis, which in general looks like this:
It is not difficult to check orthogonality; you just need to show thatfor this it is enough to know the formula for the sum of a geometric progression.
Convolution theorem
Convolution is an operation that allows you to get a 3rd one from 2 vectors.
It is written with the symbol :
As in the case of DFT itself, my goal is not to tell why convolution is needed. I'm just interested in its specific properties. One of them is how convolution is related to DFT:
The discrete Fourier transform turns the convolution into a component-wise vector multiplication, also known as the Hadamard product:
In practice, this means that convolution can be calculated using forward DFT, multiplying the results, and applying inverse DFT. And it is faster than directly calculating the convolution using the formula from the definition, since there is an effective algorithm for DFT called the fast Fourier transform (FFT).
I propose to look at why this property works, then we will finally move on to the promised Walsh-Hadamard transformation. First of all, let's rewrite the convolution formula in a slightly different form:
I have always liked this version of recording more than the classic one. More symmetrical, perhaps. Let's act on this vector with a discrete Fourier transform:
Here– This k-th row of the matrix(or column – it doesn’t matter, the matrix is symmetric).
Each component of the resulting vector looks like this:
In the expression above, I replaced the summation indices. I hope it doesn't raise any big questions.
What would the Hadamard product on the right side of the convolution theorem look like?
Is very similar. Now if only equaled…
But wait, it’s equal, because,and the expression above follows directly from the properties of the exponential function.
This is the magical property of the matrix, which allows the convolution theorem to work! This means that if we take another matrix with the same property, then the convolution theorem will be satisfied for it as well.
There are several interesting conclusions from this:
The symmetric formulas for rearranged indices are also correct. The matrix for DFT is completely determined by the element: otherwise the convolution theorem simply will not hold.
Walsh-Hadamard transformation
WHT is a special case of the Fourier transform on groups, which, in turn, is a generalization of the discrete Fourier transform.
The Walsh-Hadamard transformation is also a translation of the dimension vectorfrom one basis to another and is also expressed by multiplying a matrix by a vector, but with additional restrictions on:
Instead ofin formulas it is customary to use the exponent itself.
The transformation matrix, called the Hadamard matrix, is introduced recursively, starting from the case(0 I'll ignore, too boring). In this case, first we will get exactly the matrix of the usual DFT. Matrices of higher dimensions are constructed from matrices of lower dimensions.
For example
Let's start with the similarities with the discrete Fourier transform. Matrix– orthogonal, i.e. also describes the transfer to an orthogonal basis, this is easy to check yourself. The expression for the inverse transformation is identical to the expression for DFT:
The * sign can be completely removed from the formula above, since the transformation matrix is symmetric and contains only real numbers.
Now the differences. The convolution theorem does not hold for WHT because . Since WHT is a variant of the generalized DFT, its convolution theorem must also be a variant of the generalization of the convolution theorem for DFT. Let's try to come to this generalization.
If you look at the matrix “recursively”, you can see that the elements change sign if they are in the lower right submatrix during the “recursive descent”. If you calculate how many times the sign changes, you can easily get the value of an array element as.
For a 2 by 2 matrix this is easy to do,. As for large matrices
All indices in the lower half of the matrix are greater than or equal to 2m-1similarly, all indices in the right half of the matrix are greater than or equal to 2m-1. Such indices have a one bit in positionwhich means during recursive descent we will need to multiply the value of the element by (similar to a 2 by 2 matrix).
If we multiply these expressions for all bits of numbersAndthen we get the following:
The main thing here is not to get confused. In programming &
is a conjunction, and ^
– this is exclusive or. In mathematics is a conjunction, and – this is exclusive or. Under the entry I mean bitwise conjunction. And the expression is actually equal to the scalar productAndif you think of them as bit vectors.
We again have an exponential function in the expression for the matrix elementswhich means we can fully count on properties similar to the properties of the matrix.
Convolution for WHT
The conjunction of bit vectors is their bitwise product. And the exclusive of bit vectors is their bitwise addition modulo 2. These 2 operations behave very similarly to ordinary multiplication and addition. In particular, the distributivity property holds:
It is also important to note that
Given this, we can again use the properties of exponential functions and conclude the following:
As we remember, for DFT we had a similar formula, only the addition was different:
It is this property that was used to prove the convolution theorem, where the formulas for convolutionwere as follows:
By analogy, we can obtain the convolution theorem for the Walsh-Hadamard transform:
The convolution theorem holds, it’s just that the sum of the indices in it is not A . And this is exactly the same formula that the method is supposed to calculate convolute
.
Using other operations insteador modulo addition, you can obtain other variants of discrete Fourier transforms. Of course, there can be a lot of them.
Fast Walsh-Hadamard transform
To calculate the convolution of two vectors using the convolution theorem, you need to perform a WHT on the arguments, calculate the Hadamard product, and perform the inverse transformation. In our particular case, this will lead to code like this:
static final int M = 65536;
static void convolute(long[] C) {
wht(C);
for (int i = 0; i < M; i++) {
C[i] *= C[i];
}
wht(C);
for (int i = 0; i < M; i++) {
C[i] /= M;
}
}
We remember that . Hence the division at the end.
All we have left is to implement the method wht
:
static void wht(long[] C) {
for (int h = 1; h < M; h *= 2) {
for (int i = 0; i < M; i += (h * 2)) {
for (int j = 0; j < h; j++) {
long x = C[i + j],
y = C[i + j + h];
C[i + j] = x + y;
C[i + j + h] = x - y;
}
}
}
}
This algorithm makes full use of the recursive structure of Hadamard matrices, literally repeating the recurrent definition of matrices during calculation. The outer loop is logarithmic, and the 2 inner loops together make one pass through the array, just out of order. A detailed analysis of this implementation is beyond the scope of this article.
Conclusion
Good task. Complex. Do you know what's most interesting about it? That I deceived you. Not on purpose, of course.
Closer to the beginning of the article, I said that double looping through an array B
will do approximately 5*109 iterations and that's too much. This is true. How many iterations would a double loop through the array perform? C
if we calculated the convolution directly?
Approximately 2*109if iterated for j >= i
, this is almost 2 and a half times less. For Java the time limit is 4 seconds. Maybe he'll give it a ride?
public static void main(String[] args) {
int N = readUnsignedInt();
int B[] = new int[N + 1];
for (int i = 0; i < N; i++) {
B[i + 1] = B[i] ^ readUnsignedInt();
}
int C[] = new int[M];
for (int i = 0; i <= N; i++) {
++C[B[i]];
}
int tmp[] = C;
C = new int[M];
for (int i = 0; i < M; i++) {
for (int j = i; j < M; j++) {
C[i ^ j] += tmp[i] * tmp[j];
}
}
C[0] = (C[0] - N - 1) / 2;
int max = Integer.MIN_VALUE, imax = 0;
for (int i = 0; i < M; i++) {
if (C[i] > max) {
max = C[i];
imax = i;
}
}
System.out.print(imax);
System.out.print(' ');
System.out.println(max);
}
It's a blast!
It turns out that the problem can be solved without the Walsh-Hadamard transformation. Those. The fact that I was initially unable to solve this problem and glanced at the answer has nothing to do with the fact that I am not familiar with WHT. It is due to the fact that I was too lazy to sit longer on the task, and admitting it now is a little embarrassing.
But the author of the problem could have excluded the possibility of such a solution if instead of 16-bit numbers he had used, for example, 17-bit ones. Along with the fact that in Editorial some nonsense is written, I conclude that the author of the problem is not actually the author. He almost certainly stole it from another site, maybe without even fully understanding the necessary mathematics and without adjusting the conditions to the timings on the new site. I no longer want to look for this problem in other sources, so these thoughts will remain just a hypothesis.
In any case, I'm glad that I figured out the proposed solution, and the discrete Fourier transform actually turned out to be much simpler than I previously thought.