Improving rendering performance with CSS content-visibility

Introduction

I recently discovered interesting error at work emoji-picker-element:

I'm running on a fedi instance with 19k custom emojis […]and when I open the emoji picker […]the page freezes for at least a full second, and after that the overall performance freezes for a while.

If you are not familiar with Mastodon or Fediversethen different servers can have their own emoji, like Slack, Discord, etc. Having 19k (actually closer to 20k in this case) is highly unusual, but not unheard of.

So I ran their example, and holy gracious, it was slow:

There were several errors here:

  • 20 thousand custom emoji meant 40 thousand elements, since each of them used <button> And <img>.

  • No virtualization was used, so all these elements were simply shoved into the DOM.

To my credit I used <img loading="lazy">so those 20 thousand images didn't load all at once. But no matter what, rendering 40 thousand elements will be terribly slow – Lighthouse recommends no more than 1,400!

My first thought, of course, was, “Who the hell has 20 thousand custom emojis?” My second thought was: “Sigh Looks like I'll have to do virtualization.”

I studiously avoided virtualization in emoji-picker-elementnamely because 1) it's complicated, 2) I didn't think I needed it, and 3) it impacts accessibility.

I've already gone this route: Pinafore – this is, in fact, one big virtual list. I used role of ARIA feeddid all the calculations myself and added an option to disable “infinite scrolling” as some people don't like it. This is not my first rodeo! I was just dreading how much code I'd have to write and wondering how it would affect the size of my “tiny” ~12kB emoji picker.

However, after a few days, a thought occurred to me: what about CSS content-visibility? I saw that a lot of time was spent on layout and drawing, plus it could help the “stuttering”. This may be a much simpler solution than full virtualization.

If you're unfamiliar, content-visibility is a new CSS feature that allows you to “hide” certain parts of the DOM from a layout and drawing perspective. She's basically does not affect the availability tree (since the DOM nodes are still there), does not affect page search (⌘+F/Ctrl+F) and does not require virtualization. All it needs is to estimate the sizes of off-screen elements so that the browser can reserve space there.

Luckily for me, I had a good atomic unit for defining size: emoji categories. Custom emoji on Fediverse tend to fall into small categories: “blobs”, “cats”, etc.

For each category, I already knew the emoji size and the number of rows and columns, so calculating the expected size could be done using custom CSS properties:

.category {
  content-visibility: auto;
  contain-intrinsic-size:
    /* width */
    calc(var(--num-columns) * var(--total-emoji-size))
    /* height */
    calc(var(--num-rows) * var(--total-emoji-size));
}

These placeholders take up exactly the same amount of space as the finished product, so nothing will jump around when scrolling.

The next thing I did was write a Tachometer checkpoint to track my progress. (I love Tachometer.) This helped confirm that I was indeed improving performance and by how much.

My first benchmark was very easy to write, and the performance gains were obvious… It was just a little disappointing.

On initial boot, I got about a 15% improvement in Chrome and 5% in Firefox. (In Safari content-visibility It's only in Technology Preview, so I can't test it in Tachometer). This is nothing to worry about, but I knew that a virtual list could work much better!

So I dug a little deeper. Layout costs almost disappeared, but there were other costs that I couldn't explain. For example, what is that big unidentified clump in the Chrome trace?

Whenever I feel like Chrome is “hiding” some performance information from me, I do one of two things: I open chrome:tracing or (more recently) I turn on the experimental option “show all events” in DevTools.

This gives you a little more low-level information than the standard Chrome trace, but without having to mess around with a completely different user interface. I think this is a good compromise between the Performance panel and chrome:tracing.

And in this case, I immediately saw something that made me turn the gears in my head:

What's happened ResourceFetcher::requestResource? Even without searching the Chromium source code, I had a hunch – maybe it's all these <img>? It can't be right…? I use <img loading="lazy">!

Well I followed my gut and just commented src from each <img>and what do you know – all those mysterious expenses are gone!

I also tested in Firefox and that was also a significant improvement. So this led me to think that loading="lazy" – not as cool as I thought.

At this point I decided that if I was going to get rid of loading="lazy"then I can go full throttle and turn those 40k DOM elements into 20k. After all, if I don't need <img>then I can use CSS to simply set background-image on a pseudo element ::after on <button>cutting the time to create these elements in half.

.onscreen .custom-emoji::after {
  background-image: var(--custom-emoji-background);
}

For now it was easy IntersectionObserver For onscreenwhen the category scrolled into view and I had my own loading="lazy"which was much more productive. This time, Tachometer showed an improvement of ~40% in Chrome and ~35% in Firefox. Now this is more like the truth!

Note: I could use event contentvisibilityautostatechange instead of IntersectionObserverbut I found cross-browser differences and it would also burden Safari, causing it to load all the images. However, once browser support improves, I'll definitely use it!

I was happy with this decision and sent it. Overall, the benchmark showed a ~45% improvement in both Chrome and Firefox, and the original example dropped from ~3 seconds to ~1.3 seconds. The person who reported the bug even thanked me and said that now selecting emoji has become much more convenient.

However, something doesn't suit me about this situation. Looking at the trace, I see that rendering 20 thousand DOM nodes is simply never going to be as fast as a virtualized list. And if I want to support even larger Fediverse instances with even more emoji, this solution won't scale.

However, I'm impressed by how much you get for “free” with content-visibility. The fact that I didn't have to change my ARIA strategy or worry about on-page search was a godsend. But the perfectionist in me still bristles at the idea that to achieve maximum perfection, a virtual list is the way to go.

Maybe eventually the web platform will get a real virtual list as a built-in primitive? There were attempts to do this a few years ago, but they seem to have fizzled out.

I look forward to that day, but for now I admit that content-visibility is a good rough and ready alternative to a virtual list. It's easy to implement, gives a decent performance boost, and has virtually no barriers to access. Just don't ask me to support 100 thousand custom emojis!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *