JavaScript memory – no leaks

image

How you create and access your data can affect the performance of your application. Let’s see how.

Introduction

JavaScript is a very high level language. Thus, most developers do not think about how the data is stored in memory. In this article, we will look at how data is stored in memory, how it affects the processor and memory, and how the way you distribute data to JS and access it affects performance.

Love triangle

When a computer needs to perform some calculations, a processor (CPU) needs data to process. Thus, in accordance with the task, he sends to the memory a request for data sampling via the bus.

It looks like this:

image

So this is our romantic triangle – Processor -> Bus -> Memory

Maintaining a healthy relationship in the trio is difficult

The processor is much faster than memory. Thus, this process Processor -> Bus -> Memory -> Bus -> The processor “spends” time on calculations. While the memory is being scanned, the processor is idle.

To prevent downtime, a cache has been added to the system. We will not go into details about what a cache is or what types of cache are, but suffice it to say that the cache is the internal memory of the processor.

When the CPU receives a command to execute, it first looks for data in the cache, and if there is no data there, it sends a request via the bus.

The bus, in turn, transfers the requested data plus the conditional part of the memory and stores it in the cache for quick reference.

Thus, the CPU will have less dissatisfaction with how slow the memory is and, therefore, the CPU will have less downtime.

Quarrels are part of any relationship.

The problem that may arise – especially when dealing with processing huge amounts of data – is a phenomenon called cache miss.

A cache miss means that during the calculation, the CPU detects that it does not have the necessary data in the cache, and it needs to request this data through regular channels – you remember that there is a lot of slow memory.

image

Cash slip illustration. Data from the array is processed, but data outside the cache limit is also requested for calculation – and this creates a “cache miss”.

Ok, but I’m a JavaScript developer, why should I care?

Good question. In most cases, you don’t think about it. However, today, as more and more data passes through servers written in Node.js, you are more likely to encounter a cache miss problem when sorting through large data sets.

I will believe it when I see it !!!

Fair. Let’s look at an example.

Here is a class called Boom.

class Boom {
  constructor(id) {
    this.id = id;
  }
  setPosition(x, y) {
    this.x = x;
    this.y = y;
  }
}

This class (Boom) has only 3 properties – id, x and y.

Now let’s create a method that populates x and y.

Let’s set the data:

const ROWS = 1000;
const COLS = 1000;
const repeats = 100;
const arr = new Array(ROWS * COLS).fill(0).map((a, i) => new Boom(i));

Now we will use this data in the method:

function localAccess() {
  for (let i = 0; i < ROWS; i++) {
    for (let j = 0; j < COLS; j++) {
      arr[i * ROWS + j].x = 0;
    }
  }
}

What localAccess does is it runs linearly through the array and sets x to 0.

If we repeat this function 100 times (look at the repeats constant), we can measure how long it takes to execute:

function repeat(cb, type) {
  console.log(`%c Started data ${type}`, 'color: red');
  const start = performance.now();
  for (let i = 0; i < repeats; i++) {
    cb();
  }
  const end = performance.now();
  console.log('Finished data locality test run in ', ((end - start) / 1000).toFixed(4), ' seconds');
  return end - start;
}
repeat(localAccess, 'Local');

Log output:

image

Price for cache miss

Now, according to what we learned above, we can consider the cache miss problem if we process data that is far away during the iteration. Deleted data is data that is not in the adjacent index. Here is the function for this:

function farAccess() {
  for (let i = 0; i < COLS; i++) {
    for (let j = 0; j < ROWS; j++) {
      arr[j * ROWS + i].x = 0;
    }
  }
}

What happens here is that, at each iteration, we turn to the index, which is located at a ROWS distance from the last iteration. Therefore, if the ROWS is 1000 (as in our case), we get the following iteration: [0, 1000, 2000,…, 1, 1001, 2001,…].

Let's add this to our speed test:

repeat(localAccess, 'Local');
setTimeout(() => {
  repeat(farAccess, 'Non Local');
}, 2000);

And here is the end result:

image

Nonlocal iteration was almost 4 times slower. This difference will increase with increasing data. This is because the processor is idle due to a cache miss.

So what price do you pay? It all depends on the size of your data.

Okay, I swear I'll never do that!

Perhaps you don’t think about it, but ... there are cases when you want to access an array with some logic that is not linear (for example, 1,2,3,4,5) or is not conditional (for example, for ( let i = 0; i

Source data in memory vs sorted data in memory. Numbers indicate the indices of the objects in the source array.

Looking at the image above, we see the data in the form in which it is stored in memory (upper gray bar). Below we see the array that was created when the data came from the server. Finally, we see a sorted array that contains references to objects stored at various positions in memory.

Thus, iterating over a sorted array can lead to multiple misses in the cache, as shown in the example above.

Note that this example is for a small array. Cache misses are relevant for much larger data.

In a world where you need smooth animations on the frontend and where you can load every millisecond of processor time on the backend, this can become critical.

Oh no! Everything is lost!!!

No, not at all.

There are various solutions to this problem, but now that you know the reason for this performance drop, you can probably come up with a solution on your own. You just need to store data that is processed as close as possible in memory.

This technique is called a pattern. Data Locality Design.

Let's continue our example. Assume that in our application the most common process is data processing using the logic shown in the farAccess function. We would like to optimize the data so that it runs quickly in the most common for loop.

We organize the following data:

const diffArr = new Array(ROWS * COLS).fill(0);
for (let col = 0; col < COLS; col++) {
  for (let row = 0; row < ROWS; row++) {
    diffArr[row * ROWS + col] = arr[col * COLS + row];
  }
}

So now, in diffArr, the objects that are in the indices [0,1,2,…] in the original array are now set as follows [0,1000,2000,…, 1, 1001, 2001,…, 2, 1002, 2002, ...]. The numbers indicate the index of the object. This simulates the sorting of an array, which is one way of implementing the Data Locality design pattern.

To easily check this, we will slightly modify our farAccess function to get a custom array:

function farAccess(array) {
  let data = arr;
  if (array) {
    data = array;
  }
  for (let i = 0; i < COLS; i++) {
    for (let j = 0; j < ROWS; j++) {
      data[j * ROWS + i].x = 0;
    }
  }
}

Now add the script to our test:

repeat(localAccess, 'Local');
setTimeout(() => {
  repeat(farAccess, 'Non Local')
  setTimeout(() => {
    repeat(() => farAccess(diffArr), 'Non Local Sorted')
  }, 2000);
}, 2000);

We run this and we get:

image

And voila! “We optimized our data to fit a more common approach.”

Full example is given here..

Output

In this article, we demonstrated the Data Locality design pattern. In fact, we have shown that accessing a data structure in a way for which it is not optimized can reduce performance. Then we optimized the data according to our way of processing it - and saw how it improved performance.

The Data Locality design pattern is common in the field of game development, where you often have to deal with many entities that are repeated many times. In the above results, we see that data locality still matters even in high-level languages ​​such as JavaScript, and not just in games.

Today, given the amount of data transferred between servers or downloaded to browsers, application developers must take into account design patterns that used to be everyday for game developers — both on the server side and on the client side.

image

Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by taking paid SkillFactory online courses:


Read more

  • The coolest Data Scientist does not waste time on statistics
  • How to Become a Data Scientist Without Online Courses
  • Sorting cheat sheet for Data Science
  • Data Science for the Humanities: What is Data
  • Steroid Data Scenario: Introducing Decision Intelligence

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *