Efficient String Manipulation in JavaScript

Everything that the browser displays except for pictures and videos is strings, so competent work with them can significantly increase the speed of web applications both on the client side and on the server side.

What do you need to know about strings in terms of their efficiency? First, strings are primitive data types. Secondly, the values ​​of primitive (simple) data types, unlike composite ones, such as arrays and structures, are not mutable. This means that if you assign a value to a variable of a string type once, then this string cannot be changed later. However, such a statement may be surprising. What does this mean in practice? If, for example, you run this code:

let hello = "Hello";
hello += " world";
console.log(hello);

then it will definitely appear in the console hello world, that is, the string variable hello has changed its value. How can a string variable be immutable and change at the same time?

The fact is that the language interpreter in the case of strings does not add one string to another directly. Instead, it creates a third string in memory, then copies both “Hello” and “world” to this new string, and redirects the “hello” variable to point to this new string. Accordingly, the value of a new string is set once, while the values ​​of the first two do not change, and thus the rule of immutability of strings is fulfilled.

Here’s what the string concatenation process looks like in full:

What do you think is wrong with this process? This is a highly inefficient algorithm. It performs more than necessary and uses twice as much memory to store the same string. Of course, this is not a problem if you just need to concatenate two strings. Problems may arise when it is necessary to build large strings. For example, if you need to dynamically create the HTML text of a page from an array of data coming from an external source, using a loop through this array. In this case, the entire created string will be completely copied into a new one at each iteration of the loop. Consider a simple example of such a loop:

let str = "Hello";

console.log("START",new Date().toUTCString());

for (let index=0;index<100000000;index++) {
    str += "!";
}

console.log("END",new Date().toUTCString());
console.log(str.length);

This code creates the string “Hello” and then appends the string “!” one hundred million times. In real applications, instead of “!” there may be real data from an external array. Also, this code prints the current time before and after the loop. This way you can find out how long it takes to execute this code. Finally, it outputs the length of the resulting string. When I ran this in Google Chrome I got the following console output:

This operation completed in 1 minute 26 seconds and returned the correct string length. However, when I ran this on another computer, this code killed the current tab in the browser and output this:

After looking at how string concatenation works, it’s easy to see why the browser might crash. This algorithm is completely inefficient. This loop creates new strings ranging in size from one character to one hundred million characters each iteration of the loop one hundred million times. In this situation, it is even difficult to immediately imagine how much memory this may require. In one case, the operation may succeed, in the other case it may not. It depends on the amount of memory available and how the garbage collector works in the particular implementation of the JavaScript engine on which this code is running, that is, how quickly it manages to clean up temporarily created lines during the loop.

Seeing all this, there is a desire to correct the situation and add line to line directly. In other programming languages, such as Java or Go, there are StringBuilder or StringBuffer helper objects that do just that. They allow strings to be constructed through mutable data types such as arrays. However, this is not in JavaScript, but the idea is easy to implement on your own, which will be done next.

Let’s go back to the beginning and write the string “hello” like this:

let hello = ["Hello"];

The hello variable is not a string, but an array with a string. Arrays are mutable and if you run:

hello.push(" world");

then exactly what is written and nothing else will happen: the string “world” will be added to the array after the string “Hello” and the array will contain the following:

["Hello"," world"]

This way you can add any number of lines and Javascript will only perform one operation for each addition. However, at the end, in order to get a string, you will have to join the array using the join operation:

hello = hello.join("");
console.log(hello);

This code concatenated the array into a string and printed “Hello world” to the console. Of course, at the time of the “join” operation, the same thing happens as when concatenating strings: a new string is created, all elements of the array are copied into it, and then this string is assigned to the “hello” variable. However, this only happens once, not every time a new row is added.

This method allows you to significantly speed up the construction of strings in a loop. Let’s rewrite the example with a loop through an array:

let str = ["Hello"];

console.log("START",new Date().toUTCString());

for (let index=0;index<100000000;index++) {
    str.push("!");
}

str = str.join("");

console.log("END",new Date().toUTCString());
console.log(str.length);

Got one more line. When I ran this code on the same computer where the browser crashed, I got the following output:

The result was achieved in 8 seconds, which is 10 times faster than conventional string concatenation.

This is an example of the fact that sometimes changing three lines of code can significantly increase the performance of data processing. In real life, I came across a situation where the owner of the website, due to the slow loading of the line, first cached the data on CloudFlare, and then, in all seriousness, planned to switch to AWS to increase throughput and load balancing. However, it was just necessary to conduct a code review for the frontend.

You can use this method to construct strings from external data arrays of unknown size. Just add strings to an array, and at the very end, concatenate this array into a string.

What has been done can be thought of as a basic implementation of a Javascript StringBuilder with only one function – adding a substring. For homework, you can wrap this up as a class with various functions for working with substrings, such as “add”, “modify”, “remove” and “convert to string”.

When adding elements to an array, it is important to keep in mind the existing limits on the number of array elements, because if you do not take them into account, you may encounter the “RangeError: invalid array range” error. More information about restrictions can be found here: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Errors/Invalid_array_length . Therefore, if the number of lines to be added in the loop exceeds these limits, then you will have to periodically flush the array into temporary string buffers and then merge these buffers.

The Javascript string concatenation algorithm given at the beginning does not claim to be academically accurate for some reason. Different implementations of Javascript engines may use different string handling optimizations and memory handling mechanisms may differ. However, you should not count on the fact that your code will always run in such engines. For example, in the latest version of Google Chrome at the time of this writing, string concatenation worked as shown in the screenshots above. Therefore, the purpose of this article is to show how to work with strings more efficiently, regardless of how it is implemented by default.

There are more efficient algorithms for working with strings, based not only on arrays. The fastest one is based on the Rope data structure. It is used to speed up the insertion and deletion of substrings in huge strings. You can read more about the structure itself on Wikipedia: https://en.wikipedia.org/wiki/Rope_(data_structure) . I also think it will not be difficult to find descriptions in Russian. This is somewhat more difficult to understand and use than the method described in this article, however, you can use one of the ready-made libraries that implement Rope in JavaScript:

https://github.com/component/rope
https://github.com/josephg/jumprope

Thank you, I hope this helps you in your work. If you have questions or additions, write in the comments.

Similar Posts

Leave a Reply Cancel reply