Speeding Up Android Apps with Baseline Profiles

Hello, my name is Danil Gatiatullin, I Performance Unit Engineer at Avito. Our team is responsible for the performance of the Avito application: we monitor the speed of application startup and screen rendering, scroll quality, track network errors and perform optimizations.

In this article, I will tell you what Baseline Profiles are, how they speed up program startup, and which applications will benefit most from them. As an example, I will take our experiment, which accelerated the application startup time by 15%. I will also tell you how we automated adding profiles to each release.

The text is based on at my performance at Avito Android meetup #2.

What compilation problems did developers face before and what does Baseline Profiles have to do with it?

Let's dig a little into the history of bytecode compilation in Android – this will give a clear understanding of the relevance of Baseline Profiles.

In the first version of Android, all of the code in an app was interpreted. This caused performance issues, both at runtime and during a cold start, which required executing a lot of code. The second version added Just-In-Time (JIT) compilation, which saved instructions after interpretation so that they could be executed faster the next time. This helped with performance at runtime, but cold starts were still slow, since all of the code needed to run was interpreted. The big performance gains came later.

The speed of application launch increased significantly with the release of Android 5.0, when all application code began to undergo ahead-of-time compilation before the first launch.

With the introduction of AOT compilation, two new problems appeared:

  • Compiling code in large applications took minutes, blocking launch immediately after installation;

  • applications began to take up too much disk space.

Both problems were solved in Android 7.0 and Android 9.0:

  • Android 7.0 introduced Profile-guided optimization, an optimization technique that recorded sections of code used at startup in special profiles for compilation. Such profiles were compiled and stored locally on the device;

  • Android 9.0 introduced Cloud Profiles, a system that uploads local profiles from user devices to the cloud, converts averaged launch data into a single profile, and then distributes this profile to all devices.

The evolution of Android development. Each version shown added something new to speed up compilation

The evolution of Android development. Each version shown added something new to speed up compilation

In new versions, applications started to launch even faster, but a new drawback appeared: some time must pass before the system collects profiles with important sections of code and distributes them to most devices. The application reached high launch speed after about a week.

Graph of the application launch speed by day. Over time, the speed reached a plateau, and the user received a fast application. But this moment had to be waited for

Graph of the application launch speed by day. Over time, the speed reached a plateau, and the user received a fast application. But this moment had to be waited for

This is a big problem for applications with frequent updates: with each new release, profiles are rebuilt, which means the user will constantly have to deal with long launches after an update.

An example of the described situation: the user just received a fast-working application, and a few days later an update comes out, and again it is necessary to wait for the speed of operation

An example of the described situation: the user just received a fast-working application, and a few days later an update comes out, and again it is necessary to wait for the speed of operation

The problem with rebuilding profiles is solved by Baseline Profiles.

What is Baseline Profiles and why do we need it?

Baseline Profiles are not collected by users in production (unlike Cloud Profile), but locally during development. For generation, tests are run that affect the critical path of the user. During the tests, the system writes a list of used declarations of application methods and classes to a file, and at the stage of application assembly, this file is embedded in the apk.

This way, they are compiled in advance and loaded along with the application, reducing startup time and improving performance.

We decided to integrate Baseline Profiles into our project for three reasons:

  • there was too much code. We had done a lot of work to make our app start up fast, but the amount of code needed to start up had become so large that it was impacting startup speed. There were no longer any easy methods to speed up cold startup;

  • Frequent updates make it difficult to rely on Cloud Profiles. We have already talked about Cloud Profiles above and that they are not so effective for applications with frequent updates. This is exactly our case – weekly application updates limit the effectiveness of Cloud Profiles;

  • started using other View frameworks. We started testing Jetpack Compose and our own Backend-Driven UI. The problem with UI frameworks is that they all need a lot of classes at once to display the first frame.

Baseline Profiles solve all these problems at once. In addition to cold start, it speeds up the full rendering of content and the first frame, and also improves the smoothness of scrolling in the first seconds of use. This is important for projects with a long cold start – the very first launch of the program, when the user decides whether he will continue to use the application or not.

The number of cold starts is 40%. In absolute terms, there are also many of them - about 100 million per month.

The number of cold starts is 40%. In absolute terms, there are also many of them – about 100 million per month.

How We Tested Baseline Profiles

We built the profile locally and tested the impact on it directly on the laptop. This is done in 4 steps:

  1. We create a separate flavor apk suitable for benchmarks.

  2. We create a module with macrobenchmark tests.

  3. We add a test for profile generation, in which we touch upon critical application screens.

  4. We add a macrobenchmark test to measure acceleration from a profile.

We recorded the performance test from the last point on the main page of the application to check how much it would speed up on a “cold screen”. The result was pleasing: we got 36% acceleration rendering the first frame and 25% acceleration full rendering of content.

After that, we started thinking about how to evaluate the acceleration in production, because local tests do not give a complete picture. Two options came to mind: an A/B test and releasing two releases into production – one with Baseline Profiles and one without them. But both options did not suit us.

In the case of A/B tests, we could not control the installation of Baseline Profiles. To conduct a test, it would be necessary to enable profiles in the test group and disable them in the control group. But this is impossible – in most cases, profiles are installed by the system before installing the application, and the flag of belonging to the A/B test group is obtained only at runtime.

The ideal option for testing would be to simultaneously release two releases that differ only in the presence of a Baseline Profile. We decided not to go with this option, from afar it seemed that there could be problems associated with the automated deployment of releases and tracking their behavior (crashes, reviews) – the system is not designed for the simultaneous deployment of two releases for comparison and some actions would have to be done manually.

In the end, we came up with a scheme with three releases in a row. The first release is normal, without any integrations. The second release is with Baseline Profiles integration. The third is also normal, without Baseline Profiles. By comparing the first and second releases, we were convinced that we had achieved acceleration, and by comparing the first and third, we were convinced that this acceleration was entirely due to the influence of Baseline Profiles, and not some other change.

We tested this scheme in production and got a cool result – our release with profiles became 20% faster for rendering the first frame.

Why local testing result is different from production testing result

Local performance tests gave us 36% acceleration of first frame rendering, while in production we got only 20%. We analyzed the results and came to the following conclusions.

There are three modes available in performance tests:

  • CompilationMode.Full — that is, Full Ahead Of Time compilation of the entire application code (not related to new versions and not of interest to us);

  • CompilationMode.None — without AOT compilation, only with JIT;

  • CompilationMode.Partial — using AOT compilation, the code from the Baseline Profile file is warmed up.

We tested using Partial versus None, which is basically comparing the Baseline Profile with no AOT compilation at all.

In production, we compare the version with Baseline Profile and the version with Cloud Profiles. If we take the result over a long period – from three weeks – then the difference in acceleration from Baseline Profiles alone and from the Cloud Profiles + Baseline Profiles bundle is not so strong, we see that Cloud Profiles really start to help.

In tests for profile generation and for checking this profile, we pull up the same code, that is, we ideally “warm up” all classes.

In production, we have more variability, which is why the user receives different code, and some of them may not be as “warmed up”.

In performance tests we take one fast device. In production, the application runs on different devices, which also affects the result of future acceleration.

Performance testing is testing in sterile conditions. Keep in mind that the result will be different in production

Performance testing is testing in sterile conditions. Keep in mind that the result will be different in production

How we automated profile assembly and what came out of it

Once we had completed all the tests and realized that we would integrate Baseline Profiles into our application, the question of automation arose. We had to decide at what stage we would collect the profile.

There were several ideas, but in the end we added profile collection to the task of rolling out a new release version. Previously, this task consisted of three stages: collecting release APKs, running regression tests, and uploading artifacts to stores.

Now a new stage has been added to the very beginning – preparing a profile and adding it to the project.

This is what the update release task looks like now

This is what the update release task looks like now

Avito uses a custom runner to run tests. can shard and generate reports, rerun the flapping tests, but can't work with macrobenchmark tests and get additional artifacts.

To fix this, we have made a number of improvements to the runner:

  • taught it to run within the com.android.test plugin instead of com.android.application. That is, they began to pass the build directory with the release APK and its Package Name so that the runner could install the artifact and launch the application. For regular, non-macrobenchmark tests, this data is obtained from the com.android.application plugin;

Instrumentation Runner Configuration for Profile Test

Instrumentation Runner Configuration for Profile Test

  • taught it to parse the am instrument output, specific to macrobenchmark tests, and from it extracted the path to the generated baseline profile;

  • taught to copy this profile file from the device to the build artifacts.

After that, we decided that the generated profile should be saved in VCS – firstly, so that it remains in the change history, and secondly – so that when restarting the build, the profile generation does not have to be performed a second time if it has already been built.

The final profile assembly pipeline consisted of six steps:

  1. Checking the last commit – if the last commit is a commit with a profile, then skip all steps.

  2. Collecting APK for profiling.

  3. Run instrumentation test to generate profile.

  4. Copying a profile from a device to a build agent.

  5. Copying the profile to the project sources, after which we do git commit/push.

After that, the profile is ready and the build of the release APK is launched, which will contain the profile optimizations. We wrapped this pipeline in a gradle plugin, its code lies on our infrastructure's GitHub, and its configuration looks something like this:

Final release build pipeline. Plugin configuration for profile preparation

Final release build pipeline. Plugin configuration for profile preparation

What results did Baseline Profiles bring?

Integrating Baseline Profile into our application has reduced the number of slow cold starts from 8.3% to 5.4% — by almost 3 percentage points. Slow cold starts is a metric from Google Play, slow is considered to be an app launch that lasts more than 5 seconds.

The size of the cold start acceleration depended on the user's device, or more precisely, on its power. The application began to open:

  • on 16% faster for users with average devices (50th percentile);

  • on 20% faster – with slow devices (90th percentile);

  • on 12% faster – with the slowest devices (95th percentile).

It is not entirely clear why the slowest devices had such a small acceleration percentage, we expected more. In the future, we will try to find out the reason for this result.

What are we going to do next?

“Warm up” more screen options — for authorized and unauthorized users, for different states of toggles and AB tests.

“Warm up” more screens — Right now we are “warming up” only the first three screens, the largest and most important ones. We want to try to spread the acceleration to the 10-15 most popular screens.

Try implementing startup profiles into your application. This is another type of optimization. In it, we also create tests to look at startup scenarios and determine what code is needed to quickly render the first frame. Then, when building dex files, the code for rendering the first frame will be packed into the first few dex files, rather than distributed evenly across all of them – and therefore will load faster.

We tried integrating startup profiles a year ago, but it didn't speed things up. Let's try again.

Add Baseline Profiles monitoring. Although it is difficult to call this monitoring in the full sense of the word, because profiles only allow you to find out the installation status, but do not allow you to see the details of errors.

Bottom Line: Do You Need Baseline Profiles and What I Recommend You Do

Google promises 30% acceleration of code execution with Baseline Profiles, and we were convinced of this in a local test, having received 36% acceleration of rendering the first frame. But this is a comparison with the absence of AOT compilation, and the real acceleration in production in our case turned out to be significantly lower – about 12-20% at different percentiles.

Another nice thing is that Baseline Profiles and macrobenchmark tests have become more stable. When I tried to use them a year ago, I had to suffer with library versions and strange test crashes, but after a while the problems became much less.

Two pieces of advice For those wondering whether they should integrate Baseline Profiles into their application:

  • if your application is updated no more than once a month, then the effect of Baseline Profiles may not be so noticeable compared to the cost of their integration. Cloud Profiles can be used here. If updates are frequent (like ours), then it is worth considering Baseline Profiles;

  • profile your cold start scenario and see what can be improved. If you are not sure that all the simple speedups related to the code are already done, there is no point in going for solutions like Baseline Profiles – they are much more expensive to develop and maintain.

If you decide to spend time and effort on integrating Baseline Profiles, remember thatthree things:

  • Test the impact of Baseline Profiles locally first. Build a profile on a laptop, write a profile generation test for it and a performance test to check the effect of acceleration. See what the results will be on your specific scenario;

  • the result in production will be worse than in local tests. On average, 1.5–2 times, but in our case, it was still a steep acceleration. I explain why this happens above;

  • automation is expensive. There are many pitfalls in automation that will inevitably surface once you take on this task. So before you spend resources on automation, sync with your infrastructure team and groom the proposed task together.

Thank you for your time! Do you encounter problems with slow app startup? How do you solve them? Do you have any experience using Baseline Profiles? Tell us about it in the comments! I will be happy to answer any questions there.

By the way, my colleagues recently told us about an experiment on writing 5000 integration tests in a couple of hours and building a generator for testing: how we came to this and what it gave us. And in another article we talked about how we test in microservices, what TaaS is and why we need five quality gateways.

Subscribe to AvitoTech channel in Telegramwhere we talk more about the professional experience of our engineers, projects and work at Avito, and also announce meetups and articles.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *