How is data stored in V8 JS engine memory?

Introduction

After working for a few years on embedded systems and industrial PCs, focusing on low-level software development on Linux kernels, RTOS and WinCE, I decided to switch to high-level software development and joined Dashlane as a web developer. While learning the Javascript language, I often wondered how Javascript works internally and how the code I was writing could be translated to bytes that my computer could understand. While talking with fellow Javascript developers, I realized that many didn’t have a strong understanding of the internal workings of the Javascript engine, and I decided to search the web for answers. Because I didn’t find a many resources on the topic, I decided to write this article.

Before we dive in and open the hood of the V8 JS engine, here is a little refresher on a few key concepts.

What is JavaScript?

Javascript is a prototype-based programming language.

Here are some definitions in JavaScript:

  • An object is a collection of zero or more properties. The properties have attributes that determine how the property can be used. 
  • property is a container that holds other objectsprimitive values, or functions.
  • A primitive value is a member of one of the following built-in types:
    • Undefined
    • Null
    • Boolean
    • Number
    • String
    • Symbol
  • An object is an instance of the built-in type Object.
  • function is a callable object. A function that is associated with an object via a property is called a method

From that, we can differentiate two data types: primitive types and objects.

How does the V8 JS Engine handle data types?

The ECMAScript (“JavaScript”) specification defines the following language types:

  • Undefined
  • Null
  • Boolean
  • String
  • Symbol
  • Number
  • Object

However, the specification doesn’t give information on the data types, which tell the compiler or interpreter how the programmer intends to use the data. Some common data types include:

  • integers
  • booleans
  • characters
  • floating-point numbers
  • alphanumeric strings

For example, in C, the int type takes four bytes of memory and a char one byte of memory (on an x86 gcc compiler). It is a critical piece of information to compile efficient code.

Because compilation happens when a JavaScript program is executed, it’s difficult to do expensive reasoning about types at compilation time. Every cycle spent compiling JavaScript script before it is executed is a cycle longer than it takes to actually get the work done.

To solve this issue and run JavaScript faster, V8 uses hidden classes.

Note: To visualise how the different types are stored in memory, I compiled a debug version of V8 on my local machine and wrote some tests (I followed the documentation here and here). 

Hidden Classes

V8 internally creates hidden classes for objects at runtime, storing meta information about the object (number of properties, reference to the object’s prototype, etc).

Hidden classes are conceptually similar to classes in typical object-oriented programming languages. Because of the ability to add or remove properties from an object after its instantiation in a prototype-based language such as JavaScript, it is generally not possible to know classes up front. Hidden classes serve as an identifier for the shape of an object and are thus an important ingredient for V8’s optimising compiler and inline caches.

Hidden classes are based on the assumption that objects with the same structure (the same named properties in the same order) share the same hidden class. That way, objects with the same hidden class can use the same optimised generated code.

Let’s look at an example:

let obj1 = {};         (1)
obj1.a = 'hello';   (2)
obj1.b = 'i am';    (3)
obj1.c = 'john';    (4)
 
let obj2 = {};     (5)
obj2.b = 'i am';    (6)
obj2.a = 'hello';   (7)
obj2.c = 'john';    (8)

(1) V8 creates a hidden class C0 for ‘obj1’ defining an empty object. We will later see what information this hidden class holds.

(2) V8 creates a hidden class C1 based on C0. C1 describes the location in memory where the property ‘a’ can be found. C0 is updated with a “class transition” which states that if a property “a” is added to an empty object, the hidden class should switch from C0 to C1. The hidden class of `obj1` is now C1.

(3) V8 creates a hidden class C2 based on C1 the same way as before. C1 is updated with a “class transition” which states that if a property “b” is added to an object whose hidden class is C1, then the hidden class should be switched to C2. The hidden class of `obj1` is now C2.

(4) Same as (3)

Now we have 4 hidden classes, linked as such: 

If we do the same with obj2:

(5) V8 can use the hidden class C0 to define ‘obj2’ as en empty object.

(6) V8 creates a hidden class C4 based on C0. C4 describes the location in the memory where the property ‘b’ can be found. C0 is updated with a “class transition” which states that if a property “b” is added to an empty object, the hidden class should switch from C0 to C4.
The hidden class of `obj1` is now C4.

(6) and so on

We will eventually get the following hidden classes and their transitions.

`obj1` and `obj2` do not have the same hidden class because their properties are not declared in the same order. By doing this, different hidden classes are created and you are therefore precluding some of the optimisations V8 could otherwise provide.

Note: it’s much better to initialise dynamic properties in the same order so that hidden classes can be reused.

The series of transitions that lead to a hidden class is called a ‘Transition tree’ and is stored by V8. 

Now let’s dig even deeper and find out how V8 actually represents objects in memory.

Core data representation types/efficiently representing values

On 32-bit architectures, the V8 engine passes around 32-bit numbers to represent all values, for improved efficiency. To be able to use the same 32 bits to represent both primitives and objects, V8 uses a technique called tagging. This technique is based on the observation that, on many architectures, allocated data must be aligned on a 4-byte boundary. Data is aligned in such a way that the least significant bit will be zero. Tagging uses this bottom bit to differentiate the two types of data:

If the bit is clear, it's a small integer (or SMI)        =>           | 31-bit-signed integer (Smi) | 0 |
If the bit is set, it's a HeapObject.                     =>           | Heap Object                 | 1 |

Thanks to this technique, the same code path can handle both objects and integers.

ℹ️ Code path: A set of specific instructions that are actually executed during a single run of a program or program fragment.

SMI

The SMI is a 31-bit signed integer (max: 0xFFFFFFFE).

If you want to pass around a numeric value that is bigger than 31 signed bits, it doesn’t fit in a SMI and V8 has to create a box: the number is turned into a double, an object is created and the double is put inside of it.

Note: Because of the computation time required to create the box and access its value, it is preferable to use 31-bits signed numbers for critical calculations. Optimisations exist in V8 to handle types other than signed integers correctly, but there are cases where this process can cause memory allocation (which degrades performance). 

HeapObject

A HeapObject is a pointer that points to memory in the managed heap. It’s a superclass for everything allocated in the heap. Because the last bit is set to 1, before using the pointer, the bit needs to be cleared.

“In computer science, a pointer is a programming language object that stores the memory address of another value located in computer memory. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer.”

(Wikipedia)

The size of a pointer depends on many factors including the CPU architecture, compiler and Operating System. Usually the size is equal to the word size of the OS. So, for a 32-bit OS, the pointer size will be 4 bytes (even if the processor is 64-bit) whereas the pointer size will be 8 bytes for a 64-bit OS.

64-bit architectures

On 64-bit architectures, the V8 engine passes around 64-bit numbers. It’s a bit different, but the tagging technique is similar:

an SMI is a 32-bit signed integer and the lower 32 bits are set to 0  =>   | 32-bit signed integer |  32 * 0       |
a pointer is 64 bits and the last bit is set to 1                     =>   | Heap Object                       | 1 |

Now that we know the two different types of representation of data in memory, let’s see how it is applied to JavaScript types.

As most of our machines are now 64-bit, we will stick to 64-bit numbers.

We saw earlier that JavaScript types are divided into two groups: primitives and objects. Let’s see how they are represented in memory.

Objects

Structure

An object is a collection of properties: key-value pairs.

When an object ‘obj’ is created, V8 creates a new JS Object and allocates memory for it. The value of ‘obj’ is the pointer to this JS Object.

A JS Object is composed of:

  • Map: a pointer to the hidden class the object belongs to.
  • Properties: a pointer to an object containing named properties. Properties added after initialization of the object are added to the Properties store.
  • Elements: a pointer to an object containing numbered properties.
  • In-Object Properties/Fast properties: pointers to named properties defined at object initialization. The number of in-objects properties depend on the object.

From that observation, we can see that V8 will allocate a memory size of (8 + 8 + 8 + 8*N) bytes for this object.

Example:

Let’s check this on Chrome by following the steps below:

  • Open up DevTools on Chrome
  • Run the following code on the Console
Object

function Person(name) {
    this.firstName = name;
}
var john = new Person('John');
  • Take a ‘Heap Snapshot
  • Search for ‘Person’ from the Memory Tab

You should see something like this:

We can see the object ‘Person’ has been created and contains one In-Object Property ‘firstName’. Its shallow size is 96 bytes (Shallow size vs retained size). From this and the aforementioned formula for calculating the size of an object, we can tell that V8 has allocated enough space for nine In-Object properties.

Note: The '__proto__' tag is a reference to the prototype of the Object. It's not actually stored in the object but in the Hidden class. It is given here as a reference by the DevTools. (*)

How many in-object properties does V8 reserve for an object? How much memory is thus allocated?

We don’t want to reallocate objects every time a new property is added, neither do we want to allocate a big chunk of memory for tiny objects. To determine the appropriate size for objects, V8 uses something called “in-object slack tracking”.

The idea is that, for a given constructor, V8 initially allocates a generous amount of memory, enough for storing its properties as in-object properties (up to a maximum that we will see later). After allocating a certain number of objects from the same constructor, V8 takes a look at the transition tree of the objects and checks the maximum size of the objects. New objects will be allocated with exactly enough memory to store the maximum number of properties.

“The initial objects are also resized using a clever trick. When the initial objects are first allocated, their fields are initialised such that they appear to the garbage collector to be free space. The garbage collector doesn’t actually treat them as free space, since the maps specify the size of the objects. However, when the slack tracking process ends, the new instance size is written to maps in the transition tree, so objects with those maps effectively become smaller. Since the unused fields already look like free space, the initial objects don’t need to be modified.”

(A tour of V8: object representation)

Let’s do a test on Chrome:

Run the following code on the Chrome console and take a ‘Heap Snapshot’

Object

function Person(name) {
    this.firstName = name;
}
const people = [];
for (let i=0; i < 10; i++) {
    people.push(new Person());
}

We can see that 10 instances of ‘Person’ have been created and their shallow size is only 32 bytes (instead of 96 bytes like before): 8 bytes for pointer to the hidden class, 8 bytes for the pointer to the “Properties” store, 8 bytes to the pointer of the “Elements” object, 8 bytes for In-Object property ‘firstName’.

But what happens if a new property is added after in-object slack tracking is complete? In that case, the new property will be added to the “Properties” store or the “Elements” store. The stores for “Properties” and “Elements” can always be reallocated with a larger size as new properties are added.

Example:

Add the following code to the previous example:

Object

people[0].lastName = 'Doe';

The shallow size of the object is still 32 bytes and the property ‘lastName’ has been added to the “Properties” store.

Hidden class

Every object has a hidden class of its own, which contains the memory offset for each property. When a property is created, deleted or changed dynamically, a new hidden class is created. The new hidden class keeps the information on the existing properties and the memory offset of the new property.

A hidden class knows which hidden class to refer to when a property is changed by keeping the transition information: if an object gets a new property, the transition information of the object’s hidden class is checked to find the corresponding hidden class or to create a new one if the transition information doesn’t contain the condition identical to the property change. 

If we look back at our example above on hidden classes:

  • C0:
    • doesn’t contain property offset values as it refers to an empty object
    • contains the transition information that if the property ‘a’ is added to the object, the hidden class should be changed to C1
  • C1:
    • contains memory offset value of the ‘a’ property
    • contains the transition information that if the property ‘b’ is added to the object, the hidden class should be changed to C2
  • C2: 
    • contains memory offset value of the ‘b’ property

A hidden class is a Map object. V8 engine allocates a size of 80 bytes for each Map object.

A Map is a key data structure in v8, containing information such as:

  • the dynamic type of the object
  • the size of the object in bytes
  • the properties of the object and where they are stored
  • the type of the array elements, e.g. unboxed doubles or tagged pointers
  • the prototype of the object, if any

A Map is implemented as a Hashmap in V8. All heap objects have a Map that describes their structure.

A hidden class is basically a table of descriptors, with one entry for each property. It contains other information as well, like the size of the object and pointers to constructors and prototypes. The transition information is stored in a special descriptor.

In our previous example, we would have:

  • C0:
    • object size: <size>
    • prototype: <prototype>
    • “a”: TRANSITION to C1 at offset A
  • C1:
    • object size: <size>
    • prototype: <prototype>
    • “a”: FIELD at offset A
    • “b”: TRANSITION to C2 at offset B
  • C2:
    • object size: <size>
    • prototype: <prototype>
    • “a”: FIELD at offset A
    • “b”: FIELD at offset B
    • “c”: TRANSITION to C3 at offset C

In a nutshell, a hidden class is composed of:

  • object size: size of the object
  • prototype: a pointer to the object’s prototype
  • descriptors: a table describing the properties, with one entry for each property. 

Example:

From previous code (adding a property ‘lastName’ to people[0]), we should have the following transition tree:

DevTools help us see that by displaying the transition descriptor separately, as well as another element called back_pointer that points to the previous hidden class in the transition tree.

Indeed, we can see in DevTools that the hidden class of instance 0 of constructor Person() has a back_pointer that points to the hidden class of instance 1 of constructor Person(), and the hidden class of instance 1 of constructor Person() has a transition that points to the hidden class of instance 0 of constructor Person(). 

The way the object is stored in memory can be different from what is shown by DevTools. For example, if we follow the explanations given by Google, the hidden classes C0 should have only one property in its descriptors object (firstName).

We can see it better when using the V8 engine I compiled on my machine. Let’s take the following example:

function Person(name) {
    this.firstName = name;
}
const people = [];
for (let i=0; i<10; i++) {
    people.push(new Person());
}
people[0].lastName= 'Doe';
people[1].age = 18;

In V8, let’s display the objects people[0], people[1] and people[2]. We can notice the different properties and the transition tree from people[2].

Properties

JavaScript objects can have arbitrary properties associated with them. The names of object properties (or keys) can contain any character and are always strings. Any name used as a property key that is not a string is stringified via .toString() method. Thus, obj[“1”] and obj[1] are equal.

However, depending on the key of the property we can differentiate two types of properties:

  • numbered (or indexed) properties
  • named properties

Elements: numbered properties

If the property key is a non-negative integer (0, 1, 2, etc), the property will be stored in the “Elements” object. These properties are called elements.

V8 stores them separately from non-numeric properties for optimisation purposes.

The most common form of these objects are those generated by the Array constructor. Actually, Arrays are no different from normal objects in JavaScript. They are just objects with a named property length, and the elements of the array are properties with non-negative integer keys. length returns the largest integer key plus one. For example:

Arrays

const a = new Array();
a[100] = "foo";
a.length; // returns 101
Elements kind

V8 keeps track of what kind of elements are contained in the array to be able to optimise any operations specifically for this type of element. Let’s take the following array:

Arrays

const a = [1, 2, 3];

At the JavaScript level, if you ask the ‘typeof’ operator, you would get that the array contains numbers. However, the V8 engine makes a more precise distinction. The elements kind of the array ‘a’ is PACKED_SMI_ELEMENTS. 

When adding a floating-point number to the same array, V8 changes its elements kind to PACKED_DOUBLE_ELEMENTS.

When adding a string literal to the same array, V8 changes again its elements kind to PACKED_ELEMENTS.

Arrays

const a = [1, 2, 3];    // elements kind: PACKED_SMI_ELEMENTS
a.push(4.5);            // elements kind: PACKED_DOUBLE_ELEMENTS
a.push('a');            // elements kind: PACKED_ELEMENTS

In V8:

So far, we have three distinct kinds of elements:

  • Small integers (SMI)
  • Doubles, for floating-point numbers and integers that cannot be represented as a SMI
  • Regular elements, for values that cannot be represented as SMI or Doubles

Note: the conversion of elements can only go in one direction: from specific (ex: PACKED_SMI_ELEMENTS) to more general (ex: PACKED_ELEMENTS). The inverse is not possible!

We can differentiate between packed (or dense) arrays and sparse arrays (with holes in them). When you create holes in an array, the elements kind is converted to the ‘HOLEY’ variant.

Arrays

const a = [1, 2, 3, 4.5, 'a'];  // elements kind: PACKED_ELEMENTS
a[10] = 1;                      // elements kind: HOLEY_ELEMENTS

In V8:

Note: the elements kind conversion can go from PACKED to its HOLEY counterpart. The inverse is not possible!

V8 currently distinguishes between 21 different elements kinds.

Note: Many performance tips are given on arrays here! It’s worth a read.

Note: Adding array-indexed properties does not create new HiddenClasses (while adding named properties does).

Fast or slow elements

We can distinguish two representations of the Elements store: contiguous (fast) and dictionary-based (slow).

In fast representation, the Elements store is an array of values arranged contiguously in memory where the property index maps to the offset of the item in the array. That means that even empty slots in the array occupy space in memory.

This simple representation is wasteful for very large sparse (holey) arrays where only a few entries are occupied. In that case, V8 uses a dictionary-based representation to save memory at the cost of slightly slower access.

For example:

const sparseArray = [];

sparseArray[9999] = 'foo';

Allocating a full array with 10k entries would be wasteful here. Instead, V8 creates a dictionary where key-value-descriptor triplets are stored. The key in this case would be '9999' and the value 'foo' and the default descriptor is used. 

In V8:

Note: Array functions perform considerably slower on objects with slow elements!

Named properties

If the property key is not a non-negative integer, the property will be stored as an Inline-Object Property or in the “Properties” object.

In-Object Properties are, as seen before, directly stored in the JS Object structure. They are super-fast properties; they are the fastest properties available in V8 as they are accessible without any indirection. The number of in-object properties is predetermined by the initial size of the object. If more properties get added than there is space in the object, they are stored in the Properties store.

The Properties store is an object that can be either a Fixed Array or a Dictionary.

Fast properties

When the number of properties is low, the Properties store is defined as an Array by V8. 

The properties are simply accessed by index in the properties store. To get from the name of the property to the actual position in the properties store, we have to consult the descriptor array on the hidden class. 

These are called “Fast properties.” 

Slow properties

However, if many properties get added and deleted from an object, it can result in significant time and memory overhead to maintain the descriptor array and hidden classes. 

Hence, V8 also supports so-called slow properties. An object with slow properties has a self-contained dictionary as a Properties store. All the properties meta information is no longer stored in the descriptors table in the hidden class but directly in the properties dictionary. Hence, properties can be added and removed without updating the hidden class.

Since inline caches don’t work with dictionary properties, the latter are typically slower than fast properties.

Primitive Types

Now that we have seen how objects are stored in the V8 memory, let’s talk about how primitives are stored.

Numbers

As seen before, we can distinguish two types of numbers: the ones that can be represented by an SMI, and the others.

Let’s take a variable ‘a’ and assign the number 1 to it (a=1), and see what is displayed in the V8 environment:

We can notice the variable ‘a’ is directly stored in the memory as a SMI.

Let’s take now a variable ‘b’ and assign a floating number 1.2 to it (b=1.2):

We can notice now that the variable ‘b’ is a pointer that points to a Map with the type *_NUMBER_TYPE.

Strings

A string variable points to a Map with the type *_STRING_TYPE.

Boolean

A boolean variable points to a Map with the type ODDBALL_TYPE.

Symbols

A symbol variable points to a Symbol structure.

Undefined

An undefined variable points to a Map with type ODDBALL_TYPE.

Null

A null variable points to a map with type ODDBALL_TYPE.

I won’t go further into the data structures of primitives. You can find more information online, such as how to work with primitives like with objects (with methods) here.

Now that we have seen how data is represented in the V8 engine memory, let’s see where data is stored.

Where is data stored ?

Dynamic allocation

Whatever the language, the memory life cycle is:

  1. allocate memory needed
  2. use the allocated memory
  3. free the allocated memory

In C for example, the memory management is done by the developer with system calls such as free() or malloc() to allocate memory dynamically.

In Javascript, the memory management is done by V8 and its garbage collector. Many resources are available online which describe how the garbage collector works, so I won’t talk about it here.

Memory spaces in V8

“Variables in JavaScript (and most other programming languages) are stored in two places: stack and heap. A stack is usually a continuous region of memory allocating local context for each executing function. Heap is a much larger region storing everything allocated dynamically. This separation is useful to make the execution safer from corruption (stack is more protected) and faster (no need for dynamic garbage collection of the stack frames, fast new frame allocation).

Only primitive types passed by value (Number, Boolean, references to objecs) are stored on the stack. Everything else is allocated dynamically from the shared pool of memory called heap. In JavaScript you do not have to worry about deallocating objects inside the heap, the garbage collector frees them whenever no one is referencing them. Of course, creating large number of objects takes its performance toll (someone needs to keep all the bookkeeping) plus memory fragmentation.”

Source: Gleb Bahmutov

Conclusion

I hope you enjoyed this dive into the heart of the V8 JS Engine. While Javascript was written so that developers don’t have to worry about memory management, data types, or optimization, I find it important to understand how the tools I use on a daily basis work.

Happy coding!