Accelerating browser calculations on the knee using WebAssembly using the example of noise generation
Introduction
I recently worked on developing a 3D browser game as another pet project using the engine BabylonJS. And at some point the question arose about the need for procedural terrain generation – I’m sure everyone who had to deal with the creation of natural locations in games had such a problem. The question is to select a suitable noise generation algorithm, which can then be used as a depth map, normals and/or something else. Yes, even the generation of the movement of water, wind, and so on is usually carried out by the same algorithms!
It is worth saying that most often one of several popular algorithms is used, like Noise Perlin, Simplextheir custom modifications and so on. BabylonJS V examples with terrain generation suggests using the implementation Noise Perlinwritten in JavaScript. Knowing that this language is not the best candidate for such low-level CPU-bound operations, I decided to try to implement Perlin Noise using WebAssembly and then determine whether it was possible to speed up the generation process and, if so, by how much?
A little about WebAssembly
Official website positions the technology as follows:
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a cross-platform compilation target for programming languages that allows you to deploy client and server applications on the Internet.
At its core, this format is at the same time an already compiled set of instructions in binary form, which allows you to bypass the stages of code preparation, interpretation and JIT compilationwhich produce browser JS engines such as V8 And SpiderMonkey (you can read more about this, for example, Here).
Wasm did not appear yesterday or a year ago, so today there are game engines, platforms and editors that allow you to compile a project created in them under Wasm for porting it to browsers. But we will try to write a program and compile it for Wasm manually, without fathers, mothers and ready-made engines. This is not so much more complicated, but cleaner and more visual in terms of experiment.
IN this readme provides a list of programming languages that can be compiled into Wasm. For convenience, they are divided into those that are already actively used in production, stable for use, unstable, in development and unsupported.
The language chosen was the simplest programming language in the world Rust. It is type and memory safe and does not require Runtime garbage like G.C.although because our goal is to speed up the process as much as possible, then, with due confidence that the code will not crash and will execute correctly without UByou can get the same result with C/C++/Zig.
This article is not about the intricacies of implementing a specific algorithm in a specific language. So let's skip the review of each line of code and what happens in them and go straight to the most interesting thing – the development cycle:
So let's get started
Without further ado, it was decided to simply rewrite the algorithm from the very version offered by BabylonJS in the above example. Moreover, in this way we can get a better comparison than if one implementation was based on one algorithm, and the other on a slightly modified or completely different one, which would clearly affect the results by distorting them!
The example offered both 2D and 3D Perlin Noise. And, for the sake of more revealing results, both algorithms were rewritten.
The original implementation of the algorithm was made with appropriate top-level functions for generating noise so that each of them takes as input the coordinates of a point (x, y for 2D and x, y, z for the 3D version) and returns a floating point number in the case of 2D, and a vector (array) of such numbers for the 3D version.
The first pancake is lumpy
Having rewritten both algorithms exactly, I began to measure their execution time. To ensure complete convergence of the results, they were both uploaded to a web page and tested under identical conditions: the 2D version was tested by nesting the algorithm in a double loop, and the 3D version in a triple loop, so that they received all possible coordinates of points in the plane/space. The benchmark file itself looked like this:
import init, { Perlin } from './wasm-perlin-pkg/wasm_perlin.js'
await init()
const seed = Math.random()
noise.seed(seed)
const noiseWasm = new Perlin(seed)
const experiments = [
['perlinjs', noise],
['webperlin', noiseWasm],
]
experiments.forEach(([id, noise]) => {
const totalStart = Date.now()
for (let x = 0; x < 1e3; x++) {
for (let y = 0; y < 1e3; y++) {
noise.perlin2(x / 100, y / 100)
}
}
const totalEnd = Date.now()
console.log(`[${id}] Rendered in ` + (totalEnd - totalStart) + ' ms')
})
It is worth noting that it is recommended to use console.time instead of Date.now in case of attempts at profiling! In our case, we don’t just need to output the result to the console, and there is no difference in performance
The code is written, all that remains is to open the web page and look at the console. And what do we see?
Looks like a mistake! The difference is 3.5 times and is not in favor of the Wasm version! You need to run the algorithm, say, a hundred times to get statistical accuracy. Let's slightly change our experiment launch code:
experiments.forEach(([id, noise]) => {
const result = [...Array(100).keys()].map((i) => {
const totalStart = Date.now()
for (let x = 0; x < 1e3; x++) {
for (let y = 0; y < 1e3; y++) {
noise.perlin2(x / 100, y / 100)
}
}
const totalEnd = Date.now()
return totalEnd - totalStart
})
console.log(id, result)
})
And we get the results:
And if you didn’t look closely at the numbers and didn’t have a heart attack, then here’s the same data, but in the form of a graph:
Average values: 3.55 and 25.16, a difference of 7 times!
It turns out there is no error. Everything is clear, the results speak for themselves! JS outperforms Wasm even in CPU-bound tasks, it couldn’t be any other way!
Or is it still possible?
Reflection
Let's put aside for a moment the idea that we are comparing 2 implementations of an algorithm and try to analyze the experiment itself. It seems to us that we are comparing them in their purest form. However, is this really so?
I’m sure the smartest ones noticed the catch even at the time of the code examples of the experiment itself, but let’s still look at the reason for such disastrous results:
So we have a JS version and a Wasm version of the algorithm written in Rust, so the build of the Wasm file should not include any runtime garbage like garbage collector, some platform utilities, etc. Both there and there are purely algorithmic code. Let's move on.
It is worth returning to the fact that these implementations are, at the level of their interface, a function that accepts coordinates and returns a number. We test both implementations in the JS environment. It turns out that when you run the JS version of the algorithm in the JS environment, the function receives JS values, works with them and returns them. And the Wasm version, in addition to the algorithm itself, must first receive data through WebAssembly ABIand then send the output data through it. There is also a call to the Wasm function, which, presumably, is also not very free.
In general, it’s similar to finding out who is stronger: a bear or a clown fish, if we make the comparison in the thickness of sea water, in which the bear not only will not be able to move or breathe normally, but will most likely die immediately due to under enormous pressure.
Unfortunately, I was not able to find enough information on the Internet to add links to the article explaining such downsides when working with Wasm. I hope that among the readers there will be those who understand this issue and can clarify this point in the comments, I will be very grateful.
Take two or attempt at a great comeback
Well, our guess is that the code is slow due to the fact that we call the Wasm function, shove a couple of numbers into it and get one number out of it. And so many, many times. And the drawdown occurs precisely at the moment the + function is launched, when forwarding input data and receiving output data.
Then, as an option, you can simply pass into the function the coordinates from which to start generating our matrix (in the case of a 2D algorithm) or 3D matrix (in the case of a 3D algorithm), generate it on the Wasm side and return ready-made data arrays. And all this at the same time!
Well, to do this, let's add a function for generating a noise matrix to our implementation. He said that there would be no Rust code examples, but here:
/// 2D Perlin Noise Matrix
#[wasm_bindgen(js_name = "perlin2Matrix")]
pub fn perlin2_matrix(&self, x: Float, y: Float, scale: Float) -> js_sys::Array {
js_sys::Array::from(&JsValue::from(
(0..(x as usize))
.map(|x| {
JsValue::from(js_sys::Float64Array::from(
&(0..(y as usize))
.map(|y| self.perlin2((x as Float) / scale, (y as Float) / scale))
.collect::<Vec<_>>()[..],
))
})
.collect::<Vec<_>>(),
))
}
And, of course, we bring the use of the original algorithm to the same form:
const jsNoise = (x, y, scale) =>
[...Array(x).keys()].map((x) =>
[...Array(y).keys()].map((y) => noise.perlin2(x / scale, y / scale))
)
I think it makes no sense to show the implementation of the 3D version – there is more code, but the idea is the same. But if you are a connoisseur of the aesthetics of monstrous functional one-liners, like me, then keep link.
As a result, when we run the experiment, we get the following results:
That's another matter! The Wasm version of 2D noise processes 1.78 times faster on average.
And 3D version:
Oddly enough, we get the same difference of 1.78 times. Doubtful, well Okay.
I’m sure that if you don’t use nested vectors in the output data, you can achieve a several-fold increase in performance.
Noise visualization
Finally something interesting, and not these pieces of code and some graphs of yours
For the 2D version, the resulting value was normalized and displayed as a shade of gray (where 0 is black and 1 is white).
In the 3D version, let me remind you, we get not a number, but a vector of numbers, so I decided to do it in an interesting way: divide it into 3 parts, get the average value of each, normalize, and just like with the 2D version, get the hue value , only red, green and blue respectively for each of the 3 parts to get a colored dot.
You can see the result in your browser if you go follow the link (yes, the author cannot do the layout).
Conclusion
I would like to summarize this adventure by saying that Wasm, as a lower-level technology, will certainly be faster. The main thing is to apply it correctly, otherwise you can get a similar result and stop somewhere on the first pancake.