critical rendering stages, hardware acceleration and optimizations

When you are developing your web application, it is quite normal for the behind-the-scenes work of the browser to remain a mythical shadow behind your code, and processes are carried out without your direct participation. But as your competence in front-end development grows, and also when it is necessary to optimize high-load applications, where even micro-optimizations can bring great business and user value, understanding the processes of the browser becomes an important step on the way to your ability to identify and eliminate vulnerabilities. Let's dive deep into the browser's work, but in accessible language, we will discuss the processes happening there and find opportunities for their optimization.

After the user enters the site name in the address bar and after the browser has completed a series of navigation processes, that is, a sequence of requests whose purpose is to convert the site name into the server's IP address, the browser begins downloading the HTML file. I recommend Danila Fetisov's story about domain name resolution – DNS resolution

The browser starts parsing the HTML without waiting for it to fully load. In other words, it starts receiving the HTML/CSS code and converting it into something the browser can work with. After receiving the first chunk of information, the browser analyzes the doctype value to determine which parsing algorithms the parser will use. Parsing is performed by the browser engine.

HTML processing is similar to the general practice of language compilation, although it has its own significant differences. There is also a process of tokenization (splitting code into lexemes), and after tokenization for HTML, there is an incremental process of building a DOM (description of the contents of the HTML file in the form of a tree of tags). The parser processes line by line from top to bottom. The next moment, the browser receives CSS styles. Parsing of this file begins. The goal is to build a CSSOM. This is an object representing the styles associated with the DOM. It looks the same as the DOM, but with appropriate styles for each node of the document.

Parallel parsing itself does not block processes, but CSS is a render-blocking resource, since without building the CSSOM, the browser will not start the third stage – Render tree layout.

After the DOM and CSSOM are built, the Render tree is built. This is a visual representation of the document. The Render tree hides invisible elements (for example, display: none) and adds those that are not in the DOM, such as pseudo-elements. ::after And ::before. Starting from the root DOM element, the browser goes through each node. For each node, a matching entry is found in the CSSOM. Eventually, the browser creates visible nodes with content and their computed styles. The nodes in the Render tree are called Renderer (possibly named Render Object or Render Node depending on browser and specification).

At this stage, the browser initiates key rendering steps.

The first of these is layout.

Layout

Layout is a recursive process. The browser starts calling the layout function recursively on each Renderer, starting with the root Renderer, which corresponds to the tag element. <html>. The layout continues recursively throughout the entire Render tree hierarchy – geometric information is calculated for each node. The goal of the layout process is for each Renderer to fill its service field of width and height by calling a function, and also to know the X and Y coordinates relative to the root renderer. Initially, the browser knows that the element is a child of its parent, but this information is not enough to display this element. The browser has access to the position of the root renderer, it is equal to (0, 0), and its dimensions are the viewport – the visible part of the browser window. Each renderer has its own layout method and calls the layout method of its child elements. By passing the coordinate value down, the parent allows the child element to resolve its position. When the call reached the most nested element, all the elements were positioned in accordance with the knowledge of their coordinate, and this very nested element registered in its Renderer the number of pixels of width and height that is assigned to it by style directives and placement rules. Now, the closest parent of the element/s, knowing the size of its nested element, also knows its size and can register it. And so the information is filled upwards. That is, knowledge of the layout of children is necessary for completing the layout on each renderer. In fact, this can be mentally correlated with the processes of event transition along the DOM tree – immersion and surfacing. Down the renderers – coordinates, up – sizes

Paint

It's still not enough for the browser to have the DOM, styles, element locations, and sizes to render. Let's say you're trying to render a painting. You know the size, shape, and location of the elements, but you still have to decide in what order to draw them. For example, some elements might have a z-index, in which case drawing the elements in the order they appear in the HTML will result in incorrect rendering.

During the drawing phase, the browser creates Paint Records to determine the order in which elements are drawn and other details. The browser's Rendering Engine does not work directly with the Render Tree; it relies on technical commands to display images. Paint Records create machine-code-level directives that are used to draw pixels on the screen. Some visual components require more effort to implement than others. For example, a complex gradient background image will require more time to compose Paint Records and execute them than a simple solid color background.

Compositing

The next task is to display the intended image on the screen. The easiest way to accomplish this is to rasterize parts within the viewport. If the user scrolls the page, the rasterized frame moves and fills in the missing parts by rasterizing even more.

Rasterization only

Rasterization only

However, this is not how browsers work for a long time. Modern browsers run a more complex process called compositing. When sections of a document are drawn on different layers, and one layer is above or overlapping another, compositing becomes necessary. This step allows the browser to ensure that each layer is drawn on the screen in the correct order and that the content is displayed correctly.

Compositing

Compositing

Here a logical question arises: where do so many layers come from? In general, the need for a layered structure arose with the advent of hardware acceleration. The central processor is located on the computer's motherboard and is responsible for performing the bulk of the processes. The graphics processor (GPU) is located on the computer's graphics card and is responsible for processing and rendering graphics. The GPU is designed specifically to perform complex mathematical and geometric calculations required to render graphics. Therefore, if you shift some of the operations to the GPU, you can achieve a significant increase in performance and reduce CPU load, especially on mobile devices. The main gain occurs because such elements are subsequently processed as video card textures.

Think of textures as a bitmap that moves from main memory (RAM) to video memory and exists as a continuous unit. Animation can be achieved by moving layers and composing a new frame, and scrolling becomes lightweight, which is especially important on mobile devices. For example, if the properties transform, opacity, filter values ​​other than the default are set, this will result in the creation of a layer or stacking context for the element they are applied to. Their change occurs only during composition and does not cause layout recalculation.

FLIP

Another interesting optimization method is FLIP (First, Last, Invert, Play). The method is used to avoid layout recalculation processes and to perform animation only at the composition stage. To achieve this result, the knowledge described above is used, namely, that changes related to transform, opacity and filter occur entirely at the GPU level. What is necessary to implement the method:

  1. Get the initial coordinates of the element

  2. Take the final coordinates where the object should end up after the animation, place the object in that location

  3. To create the illusion that the object is at the starting point – use transform to move the object to this fake start

  4. Animate the movement from the top position to the desired one

We get that the element was set to the final place from the very beginning, and was pulled to the visual origin using the property transform . Throughout the process, layout was only performed at the very beginning, at the moment of page initialization. Avoiding layout gives a significant gain in performance.

In addition to this use of native optimization, there are other optimizations. For example, the CSS property will-change forces the allocation of an element to a separate layer.

Conclusion

Modern browsers perform a complex multi-step process to transform HTML, CSS, and JavaScript into the visual representation we see on the screen. These processes include parsing, DOM and CSSOM construction, render tree creation, layout, drawing, composition, and hardware acceleration optimization. We also looked at a practical example of how knowledge of the rendering stages allows you to optimize client applications and speed up their work.

That's all for now, thank you for being here.

✌️ Always happy to receive suggestions and feedback – bronnikovmb@gmail.com

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *