Deep JS. In memory of types and data

Level: Senior, Senior+

We were all taught that JavaScript has primitive and reference data types. Comprehensive information is available in the official documentation https://developer.mozilla.org/ru/docs/Web/JavaScript/Data_structuresand the Internet is full of articles on this subject.

Theory is theory, however, JS code is executed not in theory, but in practice. More precisely, it is compiled and executed by the JS engine. There are several such engines, they were developed by different people and for different purposes. It would be naive to assume that they are all completely identical to each other. This means it’s time to figure out how very specific data is actually stored in a very specific JS engine. As a test subject, let’s take one of the most common engines today: V8 from Google.

But, before we begin the analysis, let’s remember the main theoretical points.

Primitive data types, these are immutable (not changeable) values ​​stored in memory and represented in low-level language structures. Actually, in JavaScript, everything is considered a primitive type except Objectnamely:

Reference data typesthey are also objects, these are memory areas of indefinite size, and accessible by identifier (link to this memory area). In JavaScript, there is only one such type – Object. Besides Objectthere is also a separate structure Functionwhich, in fact, is also Object. Object – the only mutable (changeable) data type in JavaScript. This means that the variable does not store the value of the object itself, but only the identifier reference. When performing any manipulations with an object, the value directly in the memory area changes, but the reference to this area remains the same until we redefine it explicitly or implicitly. An object remains in memory as long as there is an active reference to it. If the reference is removed or is no longer used in the script, such an object will soon be destroyed by the garbage collector, but more on that next time.

So, we’ve remembered the theory, let’s now see if everything is so clear in practice? We will conduct experiments on the latest version of the V8 engine at the time of the study. 12.1.138 dated November 15, 2023.

Let’s start the analysis with the most understandable type. For digital systems, there is nothing more natural than numbers.

Number

According to the documentation, the Number type in JavaScript is 64-bit double precision number according to standard IEEE 754

const number = 1;

// ожидаемое значение в памяти
// 
// 0000 0000 0000 0000 0000 0000 0000 0000
// 0000 0000 0000 0000 0000 0000 0000 0001

Let’s look at this number in V8 in debug mode. To do this, we will use the engine system helper %DebugPrint

d8> const number = 1; %DebugPrint(number);
DebugPrint: Smi: 0x1 (1)

1

It looks quite expected. We see the simple meaning 0x1 with a certain type of Smi. But shouldn’t there be a Number type, as the specification says? ECMAScript? Unfortunately, it is not possible to find answers to such questions in the official documentation of the engine, so let’s turn directly to the source codes.

Smi

/src/objects/smi.h

// Smi represents integer Numbers that can be stored in 31 bits.
// Smis are immediate which means they are NOT allocated in the heap.
// The ptr_ value has the following format: [31 bit signed int] 0
// For long smis it has the following format:
//     [32 bit signed int] [31 bits zero padding] 0
// Smi stands for small integer.
class Smi : public AllStatic {

Thus, Smi (Small Integer) is a 31-bit integer. The maximum value of such a number +(2**30 - 1), minimum – -(2**30 - 1)

d8> %DebugPrint(2**30 - 1)
DebugPrint: Smi: 0x3fffffff (1073741823)

1073741823

d8> %DebugPrint(-(2**30 - 1))
DebugPrint: Smi: 0xc0000001 (-1073741823)

-1073741823

Okay, but the specification says that the Number type can store 64-bit numbers, however, Smi can only work with 31-bit numbers. What about the rest? Well, let’s see.

HeapNumber

/src/objects/heap-number.h

Let’s take a number 1 greater than the maximum Smi

d8> %DebugPrint(2**30)
DebugPrint: 0x36ac0011c291: [HeapNumber] in OldSpace
- map: 0x36ac00000789 <Map[12](HEAP_NUMBER_TYPE)>
- value: 1073741824.0
0x36ac00000789: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: HEAP_NUMBER_TYPE
- instance size: 12
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0

1073741824

It turns out that a 64-bit number in V8 representation is an object specifically of the HeapNumber type. The fact is that such numbers (they are also double precision numbers), according to the IEEE standard, consist of several parts, sign (1 bit), exponent (11 bits) and mantissa (52 bits). In fact, such a structure is stored in memory by two 32-bit words, where the first word is part of the mantissa, the second is a mix of the sign, exponent and the remaining part of the mantissa. In order to optimize performance, V8 independently implements the mathematics of such numbers, which leads it to describe the corresponding class.

A similar picture will obviously be observed with floating point numbers.

d8> %DebugPrint(0.1)
DebugPrint: 0x36ac0011c605: [HeapNumber] in OldSpace
- map: 0x36ac00000789 <Map[12](HEAP_NUMBER_TYPE)>
- value: 0.1
0x36ac00000789: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: HEAP_NUMBER_TYPE
- instance size: 12
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0

0.1

Visually, the difference between Smi and HeapNumber can be seen by taking a Heap Snapshot in the executable environment. To do this, let’s create a small script that stores two numbers in memory.

/* Замыкать значения будем в контексте функции */
function V8Snapshot() {
  this.number1 = 1;     // Smi
  this.number2 = 2**30; // HeapNumber
}

// Далее, создадим два экземпляра одного и того же класса,
// таким образом, будем иметь 4 ссылки на 2 значения
const v8Snapshot1 = new V8Snapshot();
const v8Snapshot2 = new V8Snapshot();

Let’s use the standard browser tools Chrome Dev Tools -> Memory and take a Heap Snapshot.

In the nugget we see two instances of the class V8Snapshotboth store pointers to numbers number1 And number2.

What is noteworthy here is that in both copies number1 points to the same memory area with address @233347whereas number2 in both cases it has different addresses, respectively, in memory, at the moment, two identical values ​​are stored number2. This is the fundamental difference between Smi and HeapNumber. Small numbers are, in fact, constant, and, having been assigned the first time, are not duplicated in the future, and all pointers to them refer to the same value. HeapNumber, on the other hand, is a dynamic structure; in order to find a previously stored value, it will still have to be pre-calculated, which negates all the benefits of reuse.

Conclusion

The V8 engine doesn’t actually have a Number type; instead, it has two other types:

  • Smi – integers in range -(2**30 - 1) ... +(2**30 - 1)are represented in memory as a 31-bit value

  • HeapNumber – integers outside Smi and floating point numbers are represented in memory as an internal specialized object

The numbers seem to be clear. What about the other types?

String

/src/objects/string.h

// The String abstract class captures JavaScript string values:
//
// Ecma-262:
//  4.3.16 String Value
//    A string value is a member of the type String and is a finite
//    ordered sequence of zero or more 16-bit unsigned integer values.
//
// All string values have a length field.
class String : public TorqueGeneratedString<String, Name> {

Let’s see what happens in practice

d8> %DebugPrint("")
DebugPrint: 0x25800000099: [String] in ReadOnlySpace: #
0x258000003d5: [Map] in ReadOnlySpace
- map: 0x0258000004c5 <MetaMap (0x02580000007d <null>)>
- type: INTERNALIZED_ONE_BYTE_STRING_TYPE
- instance size: variabl
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- non-extensible
- back pointer: 0x025800000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x0258000006d9 <DescriptorArray[0]>
- prototype: 0x02580000007d <null>
- constructor: 0x02580000007d <null>
- dependent code: 0x0258000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0

""

Everything is pretty obvious here. We see an object of type String with an unspecified size. According to specification, String is an array of characters, and an array in JavaScript is an object. Although the specification says that String is one of the primitive types, in fact, it is a full-fledged object with all the attributes inherent in objects, with the exception of mutability. The engine developers deliberately excluded the mutability of the String object, as required by the specification.

As with numbers, let’s look at the memory impression.

/* Для чистоты эксперимента возьмем пустую строку и не пустую */
function V8Snapshot() {
  this.emptyString = '';
  this.string = 'JavaScript';
}

const v8Snapshot1 = new V8Snapshot();
const v8Snapshot2 = new V8Snapshot();

Here we see that both instances use the same string pointers. Moreover, if we run the script several times, we will see the same addresses each time. This is achieved through the so-called String Pool concept, used in many programming languages. In simple terms, a string is a sequence of characters; based on this sequence, you can easily construct a hash of the entire object. This hash, in the future, will be a pointer to an instance of the object in the HashMap. Thus, when receiving a string, the engine compiles its hash, looks to see if there is a string with such a hash in the pool, and, if there is a string, returns a pointer to it. Otherwise, it will write a new row to the pool.

Boolean, Null, Undefined

In theory, Boolean can only take two values, true or false. For this, as a rule, 1 bit is enough, where 0 = falseA 1 = true. Let’s see if this is true in the V8.

Boolean

d8> %DebugPrint(true)
DebugPrint: 0x36ac000000c1: [Oddball] in ReadOnlySpace: #true
0x36ac0000053d: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: ODDBALL_TYPE
- instance size: 28
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- non-extensible
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0

true

Unexpected turn. It turns out that Boolean inside V8 is also an object, almost the same as HeapNumber, only with a type Oddball. What Oddball is is a little lower, but for now, let me draw your attention to the fact that a similar structure can be observed in other simple types.

Null

d8> %DebugPrint(null)
DebugPrint: 0x36ac0000007d: [Oddball] in ReadOnlySpace: #null
0x36ac00000515: [Map] in ReadOnlySpace
- map: 0x36ac000004c5 <MetaMap (0x36ac0000007d <null>)>
- type: ODDBALL_TYPE
- instance size: 28
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- undetectable
- non-extensible
- back pointer: 0x36ac00000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x36ac000006d9 <DescriptorArray[0]>
- prototype: 0x36ac0000007d <null>
- constructor: 0x36ac0000007d <null>
- dependent code: 0x36ac000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0

null

Undefined

d8> %DebugPrint(undefined)
DebugPrint: 0x25800000061: [Oddball] in ReadOnlySpace: #undefined
0x258000004ed: [Map] in ReadOnlySpace
- map: 0x0258000004c5 <MetaMap (0x02580000007d <null>)>
- type: ODDBALL_TYPE
- instance size: 28
- elements kind: HOLEY_ELEMENTS
- enum length: invalid
- stable_map
- undetectable
- non-extensible
- back pointer: 0x025800000061 <undefined>
- prototype_validity cell: 0
- instance descriptors (own) #0: 0x0258000006d9 <DescriptorArray[0]>
- prototype: 0x02580000007d <null>
- constructor: 0x02580000007d <null>
- dependent code: 0x0258000006b5 <Other heap object (WEAK_ARRAY_LIST_TYPE)>
- construction counter: 0

undefined

Oddball

/src/objects/oddball.h

// The Oddball describes objects null, undefined, true, and false.
class Oddball : public PrimitiveHeapObject {

As you can see, Oddball is an object that implements an abstract class PrimitiveHeapObject, just like, for example, HeapNumber, which we talked about a little earlier. PrimitiveHeapObject extend those structures that implement primitive, according to the specification, data types.

static const uint8_t kFalse = 0;
static const uint8_t kTrue = 1;
static const uint8_t kNotBooleanMask = static_cast<uint8_t>(~1);
static const uint8_t kNull = 3;
static const uint8_t kUndefined = 4;

From the comment and structure it is clear that this object describes 4 possible values, null, undefined, trueAnd false. But these meanings are indecently simple. Why are such complications necessary?

It’s really a matter of optimization and performance. These 4 values ​​are, in fact, constants. These values ​​may occur thousands of times during script execution. It would be extremely wasteful to allocate a new memory area for each variable declaration with one of these types. Therefore, V8 reserves these 4 values ​​in advance, even before the script starts executing. Further, when encountering one of them, the engine can operate with a simple pointer reference to a preloaded immutable object.

Let’s look into the memory cast.

function V8Snapshot() {
  this.true = true;
  this.false = false;
  this.null = null;
  this.undefined = undefined;
}

const v8Snapshot1 = new V8Snapshot();
const v8Snapshot2 = new V8Snapshot();

Here we see that all 4 values ​​are Oddball and have permanent system addresses defined before the script is launched.

Bottom line

So, we looked under the hood of the V8 engine and saw how the main data types are arranged in it. The study showed that practical implementation does not always correspond to the theoretical basis laid down for it. This does not mean, of course, that the specification ECMAScript is not correct or that the engine developers did not follow it. It is important to understand here that a specification is a kind of abstract logical layer that defines general concepts and principles. The actual application development of an engine according to the specification is a lower-level story. In addition to implementing the basic requirements, developers must take care of many issues related to performance, optimization, and, at the same time, take into account the features of different architectures and operating systems.

As we can see, almost all data types, except Smi, in the V8 engine are object ones, and variables are pointers to them.

In general, the concepts of “primitive” and “object” in JavaScript were and remain as they were laid down in the specification. But when working with data types, you should understand that these concepts are more logical than physical. The physical implementation of a particular type at the engine level may differ and have individual characteristics.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *