Deep JS. In memory of types and data
Level: Senior, Senior+
We were all taught that JavaScript has primitive and reference data types. Comprehensive information is available in the official documentation https://developer.mozilla.org/ru/docs/Web/JavaScript/Data_structuresand the Internet is full of articles on this subject.
Theory is theory, however, JS code is executed not in theory, but in practice. More precisely, it is compiled and executed by the JS engine. There are several such engines, they were developed by different people and for different purposes. It would be naive to assume that they are all completely identical to each other. This means it’s time to figure out how very specific data is actually stored in a very specific JS engine. As a test subject, let’s take one of the most common engines today: V8 from Google.
But, before we begin the analysis, let’s remember the main theoretical points.
Primitive data types, these are immutable (not changeable) values stored in memory and represented in low-level language structures. Actually, in JavaScript, everything is considered a primitive type except Objectnamely:
Reference data typesthey are also objects, these are memory areas of indefinite size, and accessible by identifier (link to this memory area). In JavaScript, there is only one such type – Object. Besides Objectthere is also a separate structure Functionwhich, in fact, is also Object. Object – the only mutable (changeable) data type in JavaScript. This means that the variable does not store the value of the object itself, but only the identifier reference. When performing any manipulations with an object, the value directly in the memory area changes, but the reference to this area remains the same until we redefine it explicitly or implicitly. An object remains in memory as long as there is an active reference to it. If the reference is removed or is no longer used in the script, such an object will soon be destroyed by the garbage collector, but more on that next time.
So, we’ve remembered the theory, let’s now see if everything is so clear in practice? We will conduct experiments on the latest version of the V8 engine at the time of the study. 12.1.138 dated November 15, 2023.
Let’s start the analysis with the most understandable type. For digital systems, there is nothing more natural than numbers.
Number
According to the documentation, the Number type in JavaScript is 64-bit double precision number according to standard IEEE 754
const number = 1;
// ожидаемое значение в памяти
//
// 0000 0000 0000 0000 0000 0000 0000 0000
// 0000 0000 0000 0000 0000 0000 0000 0001
Let’s look at this number in V8 in debug mode. To do this, we will use the engine system helper %DebugPrint
d8> const number = 1; %DebugPrint(number);
DebugPrint: Smi: 0x1 (1)
1
It looks quite expected. We see the simple meaning 0x1
with a certain type of Smi. But shouldn’t there be a Number type, as the specification says? ECMAScript? Unfortunately, it is not possible to find answers to such questions in the official documentation of the engine, so let’s turn directly to the source codes.
Smi
// Smi represents integer Numbers that can be stored in 31 bits.
// Smis are immediate which means they are NOT allocated in the heap.
// The ptr_ value has the following format: [31 bit signed int] 0
// For long smis it has the following format:
// [32 bit signed int] [31 bits zero padding] 0
// Smi stands for small integer.
class Smi : public AllStatic {
Thus, Smi (Small Integer) is a 31-bit integer. The maximum value of such a number +(2**30 - 1)
, minimum – -(2**30 - 1)
d8> %DebugPrint(2**30 - 1)
DebugPrint: Smi: 0x3fffffff (1073741823)
1073741823
d8> %DebugPrint(-(2**30 - 1))
DebugPrint: Smi: 0xc0000001 (-1073741823)
-1073741823
Okay, but the specification says that the Number type can store 64-bit numbers, however, Smi can only work with 31-bit numbers. What about the rest? Well, let’s see.
HeapNumber
Let’s take a number 1 greater than the maximum Smi
d8> %DebugPrint(2**30)
DebugPrint: 0x36ac0011c291: [HeapNumber] in OldSpace
- map: 0x36ac00000789 <Map[12](HEAP_NUMBER_TYPE)>
- value: 1073741824.0
0x36ac00000789: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: HEAP_NUMBER_TYPE
- instance size: 12
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0
1073741824
It turns out that a 64-bit number in V8 representation is an object specifically of the HeapNumber type. The fact is that such numbers (they are also double precision numbers), according to the IEEE standard, consist of several parts, sign (1 bit), exponent (11 bits) and mantissa (52 bits). In fact, such a structure is stored in memory by two 32-bit words, where the first word is part of the mantissa, the second is a mix of the sign, exponent and the remaining part of the mantissa. In order to optimize performance, V8 independently implements the mathematics of such numbers, which leads it to describe the corresponding class.
A similar picture will obviously be observed with floating point numbers.
d8> %DebugPrint(0.1)
DebugPrint: 0x36ac0011c605: [HeapNumber] in OldSpace
- map: 0x36ac00000789 <Map[12](HEAP_NUMBER_TYPE)>
- value: 0.1
0x36ac00000789: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: HEAP_NUMBER_TYPE
- instance size: 12
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0
0.1
Visually, the difference between Smi and HeapNumber can be seen by taking a Heap Snapshot in the executable environment. To do this, let’s create a small script that stores two numbers in memory.
/* Замыкать значения будем в контексте функции */
function V8Snapshot() {
this.number1 = 1; // Smi
this.number2 = 2**30; // HeapNumber
}
// Далее, создадим два экземпляра одного и того же класса,
// таким образом, будем иметь 4 ссылки на 2 значения
const v8Snapshot1 = new V8Snapshot();
const v8Snapshot2 = new V8Snapshot();
Let’s use the standard browser tools Chrome Dev Tools -> Memory and take a Heap Snapshot.
In the nugget we see two instances of the class V8Snapshot
both store pointers to numbers number1
And number2
.
What is noteworthy here is that in both copies number1
points to the same memory area with address @233347
whereas number2
in both cases it has different addresses, respectively, in memory, at the moment, two identical values are stored number2
. This is the fundamental difference between Smi and HeapNumber. Small numbers are, in fact, constant, and, having been assigned the first time, are not duplicated in the future, and all pointers to them refer to the same value. HeapNumber, on the other hand, is a dynamic structure; in order to find a previously stored value, it will still have to be pre-calculated, which negates all the benefits of reuse.
Conclusion
The V8 engine doesn’t actually have a Number type; instead, it has two other types:
Smi – integers in range
-(2**30 - 1) ... +(2**30 - 1)
are represented in memory as a 31-bit valueHeapNumber – integers outside Smi and floating point numbers are represented in memory as an internal specialized object
The numbers seem to be clear. What about the other types?
String
// The String abstract class captures JavaScript string values:
//
// Ecma-262:
// 4.3.16 String Value
// A string value is a member of the type String and is a finite
// ordered sequence of zero or more 16-bit unsigned integer values.
//
// All string values have a length field.
class String : public TorqueGeneratedString<String, Name> {
Let’s see what happens in practice
d8> %DebugPrint("")
DebugPrint: 0x25800000099: [String] in ReadOnlySpace: #
0x258000003d5: [Map] in ReadOnlySpace
- map: 0x0258000004c5 <MetaMap (0x02580000007d <null>)>
- type: INTERNALIZED_ONE_BYTE_STRING_TYPE
- instance size: variabl
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- non-extensible
- back pointer: 0x025800000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x0258000006d9 <DescriptorArray[0]>
- prototype: 0x02580000007d <null>
- constructor: 0x02580000007d <null>
- dependent code: 0x0258000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0
""
Everything is pretty obvious here. We see an object of type String with an unspecified size. According to specification, String is an array of characters, and an array in JavaScript is an object. Although the specification says that String is one of the primitive types, in fact, it is a full-fledged object with all the attributes inherent in objects, with the exception of mutability. The engine developers deliberately excluded the mutability of the String object, as required by the specification.
As with numbers, let’s look at the memory impression.
/* Для чистоты эксперимента возьмем пустую строку и не пустую */
function V8Snapshot() {
this.emptyString = '';
this.string = 'JavaScript';
}
const v8Snapshot1 = new V8Snapshot();
const v8Snapshot2 = new V8Snapshot();
Here we see that both instances use the same string pointers. Moreover, if we run the script several times, we will see the same addresses each time. This is achieved through the so-called String Pool concept, used in many programming languages. In simple terms, a string is a sequence of characters; based on this sequence, you can easily construct a hash of the entire object. This hash, in the future, will be a pointer to an instance of the object in the HashMap. Thus, when receiving a string, the engine compiles its hash, looks to see if there is a string with such a hash in the pool, and, if there is a string, returns a pointer to it. Otherwise, it will write a new row to the pool.
Boolean, Null, Undefined
In theory, Boolean can only take two values, true or false. For this, as a rule, 1 bit is enough, where 0 = false
A 1 = true
. Let’s see if this is true in the V8.
Boolean
d8> %DebugPrint(true)
DebugPrint: 0x36ac000000c1: [Oddball] in ReadOnlySpace: #true
0x36ac0000053d: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: ODDBALL_TYPE
- instance size: 28
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- non-extensible
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0
true
Unexpected turn. It turns out that Boolean inside V8 is also an object, almost the same as HeapNumber, only with a type Oddball. What Oddball is is a little lower, but for now, let me draw your attention to the fact that a similar structure can be observed in other simple types.
Null
d8> %DebugPrint(null)
DebugPrint: 0x36ac0000007d: [Oddball] in ReadOnlySpace: #null
0x36ac00000515: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: ODDBALL_TYPE
- instance size: 28
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- undetectable
- non-extensible
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0
null
Undefined
d8> %DebugPrint(undefined)
DebugPrint: 0x25800000061: [Oddball] in ReadOnlySpace: #undefined
0x258000004ed: [Map] in ReadOnlySpace
- map: 0x0258000004c5 <MetaMap (0x02580000007d <null>)>
- type: ODDBALL_TYPE
- instance size: 28
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- undetectable
- non-extensible
- back pointer: 0x025800000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x0258000006d9 <DescriptorArray[0]>
- prototype: 0x02580000007d <null>
- constructor: 0x02580000007d <null>
- dependent code: 0x0258000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0
undefined
Oddball
// The Oddball describes objects null, undefined, true, and false.
class Oddball : public PrimitiveHeapObject {
As you can see, Oddball is an object that implements an abstract class PrimitiveHeapObject, just like, for example, HeapNumber, which we talked about a little earlier. PrimitiveHeapObject extend those structures that implement primitive, according to the specification, data types.
static const uint8_t kFalse = 0;
static const uint8_t kTrue = 1;
static const uint8_t kNotBooleanMask = static_cast<uint8_t>(~1);
static const uint8_t kNull = 3;
static const uint8_t kUndefined = 4;
From the comment and structure it is clear that this object describes 4 possible values, null
, undefined
, true
And false
. But these meanings are indecently simple. Why are such complications necessary?
It’s really a matter of optimization and performance. These 4 values are, in fact, constants. These values may occur thousands of times during script execution. It would be extremely wasteful to allocate a new memory area for each variable declaration with one of these types. Therefore, V8 reserves these 4 values in advance, even before the script starts executing. Further, when encountering one of them, the engine can operate with a simple pointer reference to a preloaded immutable object.
Let’s look into the memory cast.
function V8Snapshot() {
this.true = true;
this.false = false;
this.null = null;
this.undefined = undefined;
}
const v8Snapshot1 = new V8Snapshot();
const v8Snapshot2 = new V8Snapshot();
Here we see that all 4 values are Oddball and have permanent system addresses defined before the script is launched.
Bottom line
So, we looked under the hood of the V8 engine and saw how the main data types are arranged in it. The study showed that practical implementation does not always correspond to the theoretical basis laid down for it. This does not mean, of course, that the specification ECMAScript is not correct or that the engine developers did not follow it. It is important to understand here that a specification is a kind of abstract logical layer that defines general concepts and principles. The actual application development of an engine according to the specification is a lower-level story. In addition to implementing the basic requirements, developers must take care of many issues related to performance, optimization, and, at the same time, take into account the features of different architectures and operating systems.
As we can see, almost all data types, except Smi, in the V8 engine are object ones, and variables are pointers to them.
In general, the concepts of “primitive” and “object” in JavaScript were and remain as they were laid down in the specification. But when working with data types, you should understand that these concepts are more logical than physical. The physical implementation of a particular type at the engine level may differ and have individual characteristics.