JavaScript optimization. Inline Caches

I think it’s no secret that all popular JavaScript engines have a similar code execution pipeline. It looks something like this: The interpreter quickly compiles JS code into bytes on the fly. The resulting byte code begins to be executed and is processed in parallel by the optimizer. The optimizer needs time for such processing, but the result can be highly optimized code that will work many times faster. In the V8 engine, the role of the interpreter is performed by Ignition, and the optimizer is Turbofan. In the Chakra engine, which is used in Edge, instead of one optimizer there are two, SimpleJIT and FullJIT. And in JSC (Safari), there are already three Baseline, DFG and FTL. The specific implementation in different engines may differ, but the concept is generally the same.

Today we will talk about one of the many optimization mechanisms called Inline Caches.

So, let's take a very common function and try to understand how it will work in “runtime”.

function getX(obj) {
  return obj.x;
}

Our function is quite primitive. A simple getter that takes an object as an argument and returns the property value as output x this object. From the point of view of a JS developer, the function looks absolutely atomic. However, let's take a look under the hood of the engine and see what it looks like in compiled form.

For experiments, as usual, we will use the most popular JS engine V8 (at the time of writing, version 12.6.72). For a full-fledged debug, in addition to the body of the function itself, we need to call it with a real argument. Also, let's add output of information about the object, we will need it a little lower.

function getX(obj) {
  %DebugPrint(obj);
  return obj.x;
}

getX({ x: 1 });

Let me remind you, %DebugPrint is a V8 built-in function, to use it in code you need to run d8 runtime with the flag --allow-natives-syntax. For a long time, we will output the bytecode of the executable script to the console (flag --print-bytecode).

%> d8 --print-bytecode --allow-natives-syntax index.js 
...
// байткод функции getX
[generated bytecode for function: getX (0x0b75000d9c29 <SharedFunctionInfo getX>)]
Bytecode length: 10
Parameter count 2
Register count 0
Frame size 0
         0x28c500002194 @    0 : 65 af 01 03 01    CallRuntime [DebugPrint], a0-a0
         0x28c500002199 @    5 : 2d 03 00 00       GetNamedProperty a0, [0], [0]
         0x28c50000219d @    9 : ab                Return
Constant pool (size = 1)
0xb75000d9dcd: [FixedArray] in OldSpace
 - map: 0x0b7500000565 <Map(FIXED_ARRAY_TYPE)>
 - length: 1
           0: 0x0b7500002b91 <String[1]: #x>
Handler Table (size = 0)
Source Position Table (size = 0)
// Далее информация о переданном объекте
DebugPrint: 0xb75001c9475: [JS_OBJECT_TYPE]
 - map: 0x0b75000d9d81 <Map[16](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x0b75000c4b11 <Object map = 0xb75000c414d>
 - elements: 0x0b75000006cd <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x0b75000006cd <FixedArray[0]>
 - All own properties (excluding elements): {
    0xb7500002b91: [String] in ReadOnlySpace: #x: 1 (const data field 0), location: in-object
 }
0xb75000d9d81: [Map] in OldSpace
 - map: 0x0b75000c3c29 <MetaMap (0x0b75000c3c79 <NativeContext[285]>)>
 - type: JS_OBJECT_TYPE
 - instance size: 16
 - inobject properties: 1
 - unused property fields: 0
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - stable_map
 - back pointer: 0x0b75000d9d59 <Map[16](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x0b7500000a31 <Cell value= 1>
 - instance descriptors (own) #1: 0x0b75001c9485 <DescriptorArray[1]>
 - prototype: 0x0b75000c4b11 <Object map = 0xb75000c414d>
 - constructor: 0x0b75000c4655 <JSFunction Object (sfi = 0xb7500335385)>
 - dependent code: 0x0b75000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

So, in essence, the compiled bytecode of the function getX as follows.

0x28c500002194 @    0 : 65 af 01 03 01    CallRuntime [DebugPrint], a0-a0
0x28c500002199 @    5 : 2d 03 00 00       GetNamedProperty a0, [0], [0]
0x28c50000219d @    9 : ab                Return

The first line is a function call %DebugPrint. It is purely auxiliary in nature and does not affect the executable code; we can safely omit it. Next instructions GetNamedProperty. Its task is to obtain the value of the specified property from the passed object. The instruction receives three parameters as input. The first is a reference to an object. In our case, the reference to the object is stored in a0 – first argument of the function getX. The second and third parameters are the address of the hidden class and offset you need properties in the descriptor array.

Object Shape

I described in detail what hidden classes are, an array of descriptors, and how objects in JavaScript are structured in general in the article Object structure in JavaScript engines. I will not retell the entire article, I will only outline the facts we need. Each object has one or more hidden classes (Hidden Classes). Hidden classes are exclusively an internal mechanism of the engine and are not available to the JS developer, well, at least, without using special built-in engine methods. The hidden class represents the so-called shape of the object and all the necessary information about the properties of the object. The object itself stores only property values ​​and a link to the hidden class. If two objects have the same shape, they will have a reference to the same hidden class.

const obj1 = { x: 1 }
const obj2 = { x: 2 }

//          {}     <- empty map
//          |
//        { x }    <- common map
//       /    \
// <obj1>      <obj2>

As you add properties to an object, its hidden class tree grows.

const obj1 = {}
obj1.x = 1;
obj1.y = 2;
obj1.z = 3;

// {} -> { x } -> { x, y } -> { x, y, z } -> <obj1>

Let's go back to the function we started with. Suppose we passed it an object of the following type.

function getX(obj) {
  return obj.x;
}

getX({ y: 1, x: 2, z: 3 });

What to get the property value x, the interpreter must know: a) the address of the last hidden class of the object; b) offset of this property in the descriptor array.

d8> const obj = { y: 1, x: 2, z: 3};
d8>
d8> %DebugPrint(obj);
DebugPrint: 0x2034001c9435: [JS_OBJECT_TYPE]
 - map: 0x2034000d9cf9 <Map[24](HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x2034000c4b11 <Object map = 0x2034000c414d>
 - elements: 0x2034000006cd <FixedArray[0]> [HOLEY_ELEMENTS]
 - properties: 0x2034000006cd <FixedArray[0]>
 - All own properties (excluding elements): {
    0x203400002ba1: [String] in ReadOnlySpace: #y: 1 (const data field 0), location: in-object
    0x203400002b91: [String] in ReadOnlySpace: #x: 2 (const data field 1), location: in-object
    0x203400002bb1: [String] in ReadOnlySpace: #z: 3 (const data field 2), location: in-object
 }
0x2034000d9cf9: [Map] in OldSpace
 - map: 0x2034000c3c29 <MetaMap (0x2034000c3c79 <NativeContext[285]>)>
 - type: JS_OBJECT_TYPE
 - instance size: 24
 - inobject properties: 3
 - unused property fields: 0
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - stable_map
 - back pointer: 0x2034000d9cd1 <Map[24](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x203400000a31 <Cell value= 1>
 - instance descriptors (own) #3: 0x2034001c9491 <DescriptorArray[3]>
 - prototype: 0x2034000c4b11 <Object map = 0x2034000c414d>
 - constructor: 0x2034000c4655 <JSFunction Object (sfi = 0x203400335385)>
 - dependent code: 0x2034000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

In the example above, the hidden class is located at 0x2034000d9cd1 (back pointer).

d8> %DebugPrintPtr(0x2034000d9cd1)
DebugPrint: 0x2034000d9cd1: [Map] in OldSpace
 - map: 0x2034000c3c29 <MetaMap (0x2034000c3c79 <NativeContext[285]>)>
 - type: JS_OBJECT_TYPE
 - instance size: 24
 - inobject properties: 3
 - unused property fields: 1
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - back pointer: 0x2034000d9c89 <Map[24](HOLEY_ELEMENTS)>
 - prototype_validity cell: 0x203400000a31 <Cell value= 1>
 - instance descriptors #2: 0x2034001c9491 <DescriptorArray[3]>
 - transitions #1: 0x2034000d9cf9 <Map[24](HOLEY_ELEMENTS)>
     0x203400002bb1: [String] in ReadOnlySpace: #z: (transition to (const data field, attrs: [WEC]) @ Any) -> 0x2034000d9cf9 <Map[24](HOLEY_ELEMENTS)>
 - prototype: 0x2034000c4b11 <Object map = 0x2034000c414d>
 - constructor: 0x2034000c4655 <JSFunction Object (sfi = 0x203400335385)>
 - dependent code: 0x2034000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0
0x2034000c3c29: [MetaMap] in OldSpace
 - map: 0x2034000c3c29 <MetaMap (0x2034000c3c79 <NativeContext[285]>)>
 - type: MAP_TYPE
 - instance size: 40
 - native_context: 0x2034000c3c79 <NativeContext[285]>

From which we can get an array of descriptors at 0x2034001c9491 (instance descriptors).

d8> %DebugPrintPtr(0x2034001c9491)
DebugPrint: 0x2034001c9491: [DescriptorArray]
 - map: 0x20340000062d <Map(DESCRIPTOR_ARRAY_TYPE)>
 - enum_cache: 3
   - keys: 0x2034000daaa9 <FixedArray[3]>
   - indices: 0x2034000daabd <FixedArray[3]>
 - nof slack descriptors: 0
 - nof descriptors: 3
 - raw gc state: mc epoch 0, marked 0, delta 0
  [0]: 0x203400002ba1: [String] in ReadOnlySpace: #y (const data field 0:s, p: 2, attrs: [WEC]) @ Any
  [1]: 0x203400002b91: [String] in ReadOnlySpace: #x (const data field 1:s, p: 1, attrs: [WEC]) @ Any
  [2]: 0x203400002bb1: [String] in ReadOnlySpace: #z (const data field 2:s, p: 0, attrs: [WEC]) @ Any
0x20340000062d: [Map] in ReadOnlySpace
 - map: 0x2034000004c5 <MetaMap (0x20340000007d <null>)>
 - type: DESCRIPTOR_ARRAY_TYPE
 - instance size: variable
 - elements kind: HOLEY_ELEMENTS
 - enum length: invalid
 - stable_map
 - non-extensible
 - back pointer: 0x203400000061 <undefined>
 - prototype_validity cell: 0
 - instance descriptors (own) #0: 0x203400000701 <DescriptorArray[0]>
 - prototype: 0x20340000007d <null>
 - constructor: 0x20340000007d <null>
 - dependent code: 0x2034000006dd <Other heap object (WEAK_ARRAY_LIST_TYPE)>
 - construction counter: 0

Where the interpreter can find the desired property by its name and get offsetin our case equal 1.

So the interpreter path will look something like this.

Our case is quite simple. Here, property values ​​are stored inside the object itself (in-object), which makes accessing them quite fast. But, nevertheless, determining the position of a property in the descriptor array will still be proportional to the number of properties themselves. Moreover, in the article Object Structure in JavaScript Engines, I said that values ​​are not always stored in such a “fast” way. In some cases, the storage type can be changed to a slow “dictionary”.

Inline Cache

Of course, in isolated cases for a JS engine this does not cause any great difficulties, and the time it takes to access property values ​​can hardly be seen with the naked eye. But what if our case is not isolated? Suppose we need to execute a function hundreds or thousands of times in a loop? The total operating time here becomes critically important. Given that the function actually performs the same operations, the JavaScript developers suggested optimizing the property search process and not performing it every time. All that is required is to store the address of the hidden object class and offset the desired property.

Let's go back to the very beginning of the article and to the instructions GetNamedProperty, which has three parameters. We have already found out that the first parameter is a reference to an object. The second parameter is the address of the hidden class. The third parameter is found offset properties. Having defined these parameters once, the function can remember them and not perform the value search procedure again the next time it is launched. This caching is called Inline Cache (IC).

TurboFan

However, it is worth considering that memoizing parameters also requires memory and some CPU time. This makes the mechanism effective only when the function is intensively performed (the so-called hot-function). How intensively a function is executed depends on the number and frequency of its calls. The optimizer evaluates the intensity. In the case of V8, TurboFan acts as an optimizer. First of all, the optimizer collects a graph from AST and bytecode and sends it to the “inlining” phase, where metrics for calls, loads and cache stores are collected. Next, if a function is suitable for inline caching, it is queued for optimization. After the queue is full, the optimizer must determine the “hottest” one and cache it. Then take the next one and so on. Unlike, by the way, its predecessor Crankshaft, which took functions one by one, starting with the first. In general, this topic is quite large and deserves a separate article; now it makes no sense to consider all the details and features of the TurboFan heuristics. Let's move on to examples.

IC types

To analyze the operation of the IC, you need to enable logging in the runtime environment. In the case of d8, this can be done by specifying the flag --log-ic.

%> d8 --log-ic

function getX(obj) {
  return obj.x;
}

for (let i = 0; i < 10; i++) {
  getX({ x: i });
}

As I already said, TurboFan starts caching object properties only in hot functions. To make our function work like this, we will need to run it in a loop several times in a row. My simple d8 script required at least 10 iterations. In practice, under other conditions and with other features, this number may vary and is likely to be higher. The resulting log can now be run through System Analyzer.

Grouped by type: 1#
>100.00%	1	LoadIC
Grouped by category: 1#
>100.00%	1	Load
Grouped by functionName: 1#
>100.00%	1	~getX
Grouped by script: 1#
>100.00%	1	Script(3): index.js
Grouped by sourcePosition: 1#
>100.00%	1	index.js:3:14
Grouped by code: 1#
>100.00%	1	Code(JS~)
Grouped by state: 1#
>100.00%	1	0 → 1
Grouped by key: 1#
>100.00%	1	x
Grouped by map: 1#
>100.00%	1	Map(0x3cd2000d9e61)
Grouped by reason: 1#
>100.00%	1	

In the IC List we see a notation like LoadIC, which talks about accessing an object's property from the cache. The function itself is indicated here functionName: ~getXhidden class address Map(0x3cd2000d9e61) and property name key: x. But of greatest interest is state: 0 -> 1.

There are several types of IC. The complete list is as follows:

/src/common/globals.h#1481

// State for inline cache call sites. Aliased as IC::State.
enum class InlineCacheState {
  // No feedback will be collected.
  NO_FEEDBACK,
  // Has never been executed.
  UNINITIALIZED,
  // Has been executed and only one receiver type has been seen.
  MONOMORPHIC,
  // Check failed due to prototype (or map deprecation).
  RECOMPUTE_HANDLER,
  // Multiple receiver types have been seen.
  POLYMORPHIC,
  // Many DOM receiver types have been seen for the same accessor.
  MEGADOM,
  // Many receiver types have been seen.
  MEGAMORPHIC,
  // A generic handler is installed and no extra typefeedback is recorded.
  GENERIC,
};

In the system analyzer, these types are represented by the symbols

0: UNINITIALIZED
X: NO_FEEDBACK
1: MONOMORPHIC
^: RECOMPUTE_HANDLER
P: POLYMORPHIC
N: MEGAMORPHIC
D: MEGADOM
G: GENERIC

For example, type X (NO_FEEDBACK) indicates that the optimizer has not yet collected sufficient statistics to set the function for optimization. In our case we see state: 0 -> 1 means that the function has entered the “monomorphic IC” state.

Monomorphic

I already said that in practice, the same function can take an object as an argument. The form of this object on a specific function call may differ from the previous one. Which makes life a little more difficult for the optimizer. The least problems arise when we pass only objects of the same shape to the function, as in our last example.

function getX(obj) {
  return obj.x; // Monomorphic IC
}

for (let i = 0; i < 10; i++) {
  getX({ x: i }); // все объекты имеют одинаковую форму
}

In this case, the optimizer just needs to remember the address of the hidden class and offset properties. This type of IC is called monomorphic.

Polymorphic

Let's now try to add a function call with objects of different shapes.

function getX(obj) {
  return obj.x; // Polymorphic IC
}

for (let i = 0; i < 10; i++) {
  getX({ x: i, y: 0 });
  getX({ y: 0, x: i });
}

In the “IC List” we now see:

50.00%        1    0 → 1
50.00%        1    1 → P

The function received several different object forms at runtime. Property access x Each form will be different. Each has its own hidden class address and its own offset this property. In this case, the optimizer allocates two slots for each object shape, where it saves the necessary parameters. Such a structure can be conventionally represented as an array of parameter sets.

[
  [Map1, offset1],
  [Map2, offset2],
  ...
]

This type of IC is called polymorphic. Polymorphic IC has a limit on the number of forms allowed.

/src/wasm/wasm-constants.h#192

// Maximum number of call targets tracked per call.
constexpr int kMaxPolymorphism = 4;

By default, a polymorphic type can have 2 to 4 forms per function. But this parameter can be adjusted with the flag --max-valid-polymorphic-map-count.

Megamorphic

If there are more forms of the object than Polymorphic can digest, the type changes to megamorphic.

function getX(obj) {
  return obj.x; // Megamorphic IC
}

for (let i = 0; i < 10; i++) {
  getX({ x: i });
  getX({ prop1: 0, x: i });
  getX({ prop1: 0, prop2: 1, x: i });
  getX({ prop1: 0, prop2: 1, prop3: 2, x: i });
  getX({ prop1: 0, prop2: 1, prop3: 2, prop4: 3, x: i });
}

IC List Result:

40.00%        2    P → P
20.00%        1    0 → 1
20.00%        1    1 → P
20.00%        1    P → N

Searching for the required set of parameters in this case eliminates the savings in CPU time and, therefore, does not make any sense. Therefore, the optimizer simply stores the symbol in cache MegamorphicSymbol. For subsequent function calls, this will mean that there are no cached parameters here; they will have to be taken in the usual way. Likewise, there is no point in further optimizing the function and collecting its metrics.

Conclusion

You probably noticed that the list of IC types also includes MEGADOM. This type is used for caching DOM tree nodes. The fact is that the inline caching mechanism is not limited to functions and objects. It is actively used in many other places, including outside V8. We physically cannot cover the entire volume of information about caching at one time. And since we are talking about JavaScript objects today, we will not consider the MegaDom type in this article.

Let's run a couple of tests and see how effectively Turbofan optimization works in V8. We will conduct the experiment in the environment Node.js (latest stable version at the time of writing v20.12.2).

Experimental code:

const N = 1000;

//===

function getXMonomorphic(obj) {
  let sum = 0;

  for (let i = 0; i < N; i++) {
    sum += obj.x;
  }

  return sum;
}

console.time('Monomorphic');

for (let i = 0; i < N; i++) {
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
  getXMonomorphic({ x: i });
}

console.timeLog('Monomorphic');

//===

function getXPolymorphic(obj) {
  let sum = 0;

  for (let i = 0; i < N; i++) {
    sum += obj.x;
  }

  return sum;
}

console.time('Polymorphic');

for (let i = 0; i < N; i++) {
  getXPolymorphic({ x: i, y: 0 });
  getXPolymorphic({ y: 0, x: i });
  getXPolymorphic({ x: i, y: 0 });
  getXPolymorphic({ y: 0, x: i });
  getXPolymorphic({ x: i, y: 0 });
}

console.timeEnd('Polymorphic');

//===

function getXMegamorphic(obj) {
  let sum = 0;

  for (let i = 0; i < N; i++) {
    sum += obj.x;
  }

  return sum;
}

//===

console.time('Megamorphic');

for (let i = 0; i < N; i++) {
  getXMegamorphic({ x: i });
  getXMegamorphic({ prop1: 0, x: i });
  getXMegamorphic({ prop1: 0, prop2: 1, x: i });
  getXMegamorphic({ prop1: 0, prop2: 1, prop3: 2, x: i });
  getXMegamorphic({ prop1: 0, prop2: 1, prop3: 2, prop4: 3, x: i });
}

console.timeLog('Megamorphic');

First, let's run the script with optimization disabled and see the “pure” speed of the functions.

%> node --no-opt test.js
Monomorphic: 68.55ms
Polymorphic: 69.939ms
Megamorphic: 85.045ms

For the purity of the experiment, I repeated the tests several times. The speed of operation of monomorphic and polymorphic objects is approximately the same. Polymorphic ones can sometimes be even faster. This is no longer connected so much with the operation of V8 itself, but with system resources. But the speed of megamorphic objects is somewhat lower due to the fact that at this stage a tree of hidden classes is formed and access to the properties of these objects is a priori more difficult than in the first two cases.

Now let's run the same script with optimization enabled:

%> node test.js
Monomorphic: 9.313ms
Polymorphic: 9.673ms
Megamorphic: 29.104ms

The speed of monomorphic and polymorphic functions has increased approximately 7 times. As in the previous case, the difference between these two types is insignificant and with repeated tests the polymorphic ones are sometimes even faster. But the speed of the megamorphic function increased only 3 times. In general, based on the theory described above, megamorphic functions should not have shown an increase in speed during optimization. However, not all so simple. First, they still have the side effect of optimization, since such functions are excluded from further metrics collection. Although this is not a big one, it is still an advantage over other types. Secondly, JS optimization is not limited to inline caching of access to object properties. There are a number of other types of optimizations that we did not consider in this article.

Moreover, in 2023, Google released the “Chromium M117” release, which included a new optimizer Maglev. It was built in as an intermediate between Ignition (interpreter) and Turbofan (optimizer). Now the optimization process has acquired a three-stage architecture and looks like Ignition -> Maglev -> Turbofan. Maglev contributes to function optimization, in particular, it works with function arguments. We'll talk more about this another time. For now, we can conclude that megamorphic functions are approximately two times slower than monomorphic and polymorphic ones.


Read this and my other articles in my channel:

EN: https://t.me/frontend_almanac_ru
EN: https://t.me/frontend_almanac

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *