effective and free build cache

Context

This note is something like a guide on budget setting up GitLab caches for Gradle in Android projects without using Gradle Enterprise and Remote Build Cache. There are few text materials on the Internet about how to set up Gradle caches on CI, and even fewer about how to do it correctly. In addition, when someone asks the right questions in Gradle Slack, on the Gradle forum or on StackOverflow, the Gradle-influenced come running and recommend not to dig in the right direction, but just buy Gradle Enterprise (or whatever it's called now).

I have experience in building a build infrastructure in a large commercial Android project. And I want to share life hacks with those who are just starting to optimize the build on CI in their project.

Problem

The problem is that Gradle is shit
– Jason Statham

Every self-respecting project has at least a basic CI/CD system, most often in GitLab. As the project grows, the waiting time for build/test/linter runs, etc. also increases. At some point, the waiting time for your MR to be checked exceeds all acceptable limits and you want to do something about it.

Solution

The solution I propose works on projects up to 500 modules. If you have more modules, you probably know better than me how to optimize builds.

When I say “500 modules” I mean 500 modules from wonderful report by Stepan Goncharov.

Checking that the basic things are done correctly

Separation of pipelines by purpose

Generate cache in some pipelines, use it in others.

First of all, you need to understand that loading and saving the cache in GitLab is not a free process in terms of time. The larger the cache, the longer it takes to pack it into an archive and upload it to the cloud. Even Gradle Remote Build Cache is not a free thing, especially if you use the standard Remote Build Cache plugin and the official Docker node. Therefore, in the basic implementation, not all pipelines should generate a cache.

The scheme I propose is as follows:

  • The project has a main branch, let's call it master.

  • Eat feature-branches that at the end of their life merge into master.

  • When poured into master the pipe that generates the build cache is launched. If you don't want to launch it when pouring, launch it on schedule.

  • When pipelines are launched on feature-branches, they use the cache generated on the branch master.

Different pipeline purposes

Different pipeline purposes

Separating GitLab caches by purpose

Build cache, dependencies and Gradle Wrapper should be cached using separate keys

In a project, the product code changes most often, dependencies change a little less often, and the Gradle version changes even less often. I think that there should be a separate cache for these entities. The total weight of the archive with the general cache will quickly exceed the limits allowed for uploading via s3. In addition, jobs that do not need all types of cache will be executed faster if they do not load anything extra.

Different types of Gradle cache

Different types of Gradle cache

Everything that needs to be cached on CI, Gradle stores in a directory $GRADLE_USER_HOME. You can override this environment variable for jobs:

some job:
  variables:
    GRADLE_USER_HOME: $CI_PROJECT_DIR/.gradle

In all further examples I will assume that $GRADLE_USER_HOME exactly like this.

$GRADLE_USER_HOME/
├── caches/
│   ├── build-cache-1/ <- Билд-кэш
│   └── modules-2/ <- кэш зависимостей
├── notifications/ <- мусор для wrapper
└── wrapper/ <- дистрибутивы wrapper

Separate GitLab Cache for Gradle Wrapper

Gradle distributions are stored in $GRADLE_USER_HOME/wrapper/and the additional waste from it is stored in $GRADLE_USER_HOME/notifications/. It would be good to allocate separate GitLab cache keys for these entities:

# cache.yml

.pull-wrapper-cache:
  - key: cache-wrapper
    policy: pull
    unprotect: true
    paths:
      - .gradle/wrapper/
      - .gradle/notifications/

.pull-push-wrapper-cache:
  - key: cache-wrapper
    policy: pull-push
    unprotect: true
    paths:
      - .gradle/wrapper/
      - .gradle/notifications/

Then in the jobs that generate the build cache we can specify:

cache build:
  ...
  cache:
    - !reference [ .pull-push-wrapper-cache ]

The Gradle documentation says that unused versions of distributions are automatically removed after some time. Let's take their word for it. The distribution cache shouldn't grow too big.

In jobs that only consume build cache, you can specify:

lint:
  ...
  cache:
    - !reference [ .pull-wrapper-cache ]

Gradle distributions do not weigh much, so archives with them will be loaded into the job quickly.

Separate GitLab cache for dependencies

Dependencies are AAR and JAR artifacts of all libraries and plugins used in the project. Gradle stores them in $GRADLE_USER_HOME/caches/modules-2/. We highlight extensions for working with dependency caches:

# cache.yml

.pull-deps-cache:
  - key: cache-deps
    policy: pull
    unprotect: true
    paths:
      - .gradle/caches/modules-2/

.push-deps-cache:
  - key: cache-deps
    policy: push
    unprotect: true
    paths:
      - .gradle/caches/modules-2/

Then in the jobs that generate the build cache we can specify:

cache build:
  ...
  cache:
    - !reference [ .push-deps-cache ]

As you can see, the cache generation job does not receive dependencies at all at startup and always downloads them again. I explain this further.

In jobs that only consume build cache, you can specify:

lint:
  ...
  cache:
    - !reference [ .pull-deps-cache ]

Project dependencies are usually larger than gradle distributions, but smaller than build cache.

Separate GitLab cache for Gradle build cache

The heaviest part of the cache is the build cache.

The build cache is stored in $GRADLE_USER_HOME/caches/build-cache-1/. There are no nested directories inside, there is just a huge sheet of binary files. We create extensions for working with the build cache:

# cache.yml

.pull-build-cache:
  - key: cache-build
    policy: pull
    unprotect: true
    paths:
      - .gradle/caches/build-cache-1/

.push-build-cache:
  - key: cache-build
    policy: push
    unprotect: true
    paths:
      - .gradle/caches/build-cache-1/

The cache generation job looks like this:

cache build:
  ...
  cache:
    - !reference [ .push-build-cache ]

And again you can see that the job for generating the cache does not receive the previous version of the cache. I explain this further.

Jobs consuming build cache receive the following entry:

test:
  ...
  cache:
    - !reference [ .pull-build-cache ]

Putting it all together

If you put it all together, you might get something like this:

.base:
  variables:
    GRADLE_USER_HOME: $CI_PROJECT_DIR/.gradle
  # Тут какие-то еще базовые настройки, которые я пропустил
  before_script:
    - ...
  after_script:
    - ...

# Эта джоба запускается в МРе
build:
  stage: check
  extends: .base
  script:
    - ./gradlew :app:assembleDebug
  cache:
    - !reference [ .pull-wrapper-cache ]
    - !reference [ .pull-deps-cache ]
    - !reference [ .pull-build-cache ]
  rules:
    - if: $CI_PIPELINE_SOURCE = "merge_request_event"

# Эта джоба запускается после вливания МРа
cache build:
  stage: post-check
  extends: .base
  script:
    - ./gradlew :app:assembleDebug
  cache:
    - !reference [ .pull-push-wrapper-cache ]
    - !reference [ .push-deps-cache ]
    - !reference [ .push-build-cache ]
  rules:
    - if: $CI_PIPELINE_SOURCE = "push" && $CI_COMMIT_BRANCH == "master"

At this stage, the assembly in the MPa runs will already be noticeably accelerated. But as the soy IT guys like to say, “there are growth points here.”

Build cache is not reused

Build cache is not reused

You can see, Jobe cache build the cache from its previous runs is not given. That is, the build and download of dependencies in this job occurs every time from scratch. All because Gradle does not independently clear unused cache and dependencies. If we use the results of previous runs to generate a new cache, the cache will grow at a tremendous speed. It will quickly exceed the critical mark of 5 gigabytes, after which you will not even be able to upload it to the s3 storage. In short, this is just a way to protect against uncontrolled growth of the GitLab cache.

The complete absence of a build cache leads to long runs. So long that it is worth considering whether it is advisable to generate a cache with each infusion into masteror is it better to do it on schedule?

If only we could clean up unnecessary cache entries in a directory ourselves $GRADLE_USER_HOME/caches/build-cache-1/it would be super convenient, mmm?

Oops, buy Gradle Enterprise

The functionality we need is already in Gradle Enterprise, but you may be asked for a buck for it. And if you work in God-protected, they won’t even sell it to you. You can manually view the used Gradle cache keys using the option --scan. But automating the collection and parsing of this data on CI is problematic. I will write a separate note about how to break the Gradle Enterprise plugin and force it to share build scans. For the current note, I found a simpler way to solve the problem.

POV: You dig down into Gradle

POV: You dig down into Gradle

Gradle's business model is built on hatred of people, we condemn this, so we will not give them money. Let's extract cache keys from builds on CI without the Enterprise version.

We extract cache keys

In order to get the list of cache keys of the latest build, we will have to use gradle internal api. A good explanation of what we do next is in Tinkoff's report at Mobius Spring 2024.

Writing BuildService

We implement our own service that subscribes to all build operations and at the end of the build unloads a list of cache keys into a file:

internal abstract class CacheKeysHandlerService :
    BuildService<CacheKeysHandlerService.Params>,
    BuildOperationListener,
    AutoCloseable {

    interface Params : BuildServiceParameters {
        val cacheKeysFile: RegularFileProperty
    }

    private val cacheKeys: MutableSet<String> = ConcurrentHashMap.newKeySet()

    override fun started(descriptor: BuildOperationDescriptor, event: OperationStartEvent) {
        /* no-op */
    }

    override fun progress(identifier: OperationIdentifier, event: OperationProgressEvent) {
        /* no-op */
    }

    override fun finished(descriptor: BuildOperationDescriptor, event: OperationFinishEvent) {
        when (val details = descriptor.details) {
            // cохранение в кэш (локальный и remote)
            is StoreOperationDetails -> cacheKeys += details.cacheKey
            // загрузка из кэша (локального и remote)
            is LoadOperationDetails -> cacheKeys += details.cacheKey
            // сериализация кэша
            is PackOperationDetails -> cacheKeys += details.cacheKey
            // десериализация кэша
            is UnpackOperationDetails -> cacheKeys += details.cacheKey
        }
    }

    override fun close() {
        parameters.cacheKeysFile.get().asFile.bufferedWriter().use { writer ->
            for (key in cacheKeys) {
                writer.appendLine(key)
            }
        }
    }
}

Let me explain what's going on:

  1. All Gradle services running in the background during a build must implement the interface BuildService<*>. Interface Params is an “external API” for our service.

  2. BuildOperationListener we implement in order to subscribe to all Gradle build operations. This is the interface from the package internal. Unfortunately, a similar “listener” from the public API does not provide the ability to look at cache keys.

  3. Interface AutoCloseable in order to implement the method close(). It will be called at the end of the build. This is where we will have to process the data collected during the entire build.

  4. Field cacheKeys: MutableSet<String>. Here we accumulate cache keys. It is important that BuildOperationListener is not thread-safe, we cannot block its operation, and we need to handle the multi-threaded call of its methods ourselves. Therefore, to implement cacheKeys we use a collection from java.util.concurrent.

  5. Of all the callback methods BuildOperationListener we only need to implement finished. Events of interest to us arrive there, including events of processing cache keys. Cache keys can be repeated, so we have Set. We monitor all possible events for reliability.

We connect the service we created using the convention plugin:

@Suppress("unused", "UnstableApiUsage")
internal abstract class CacheKeysHandlerPlugin @Inject constructor(
    providers: ProviderFactory,
    layout: BuildLayout,
    private val registryInternal: BuildEventListenerRegistryInternal,
) : Plugin<Settings> {

    /** По-умолчанию плагин выключен, врубаем его только на CI */
    private val enabled = providers
        .gradleProperty("com.example.build.cache-keys.enabled")
        .map { it.toBoolean() }
        .getOrElse(false)

    /** Можем указать кастомный путь до output-файла с кэш-ключами */
    private val cacheKeysFile = providers
        .gradleProperty("com.example.build.cache-keys.file-name")
        .orElse("cache-keys.txt")
        // Когда мы указываем путь через layout, а не через java.io.File,
        // Gradle сам создает файл на старте билда.
        .map { layout.rootDirectory.file(it) }

    override fun apply(target: Settings): Unit = with(target) {
        if (!enabled) return

        val serviceProvider = gradle.sharedServices.registerIfAbsent(
            "cache-keys-handler-service",
            CacheKeysHandlerService::class.java,
        ) { spec ->
            with(spec) {
                parameters.cacheKeysFile.set(cacheKeysFile)
            }
        }
        registryInternal.onOperationCompletion(serviceProvider)
    }
}

List of what can be injected into plugin and service constructors: Understanding Services and Service Injection.

We connect this convention plugin in settings.gradle your project file. This is where the most difficult part actually ends.

It is important to note that this method only works 100% if two conditions are met:

  1. When there are no UP-TO-DATE tasks. Because if there is an UP-TO-DATE task, Gradle does not use the build caching mechanism and tracking cache keys becomes an order of magnitude more difficult.

  2. When Configuration Cache is disabled.

There is no such problem in CI in clean containers and the configuration cache is disabled there (it is disabled, right?).

Locally, when testing functionality, this must be taken into account. Before testing, call ./gradlew cleanto delete build directories in all modules, and also use the argument --no-configuration-cache.
More about differences between UP-TO-DATE and FROM-CACHE.

We are finalizing the basic solution

Reusing the build cache to generate a new cache

Since we now know which cache keys were used during the build, we can safely drop all the rest. This way, we speed up the pipeline run for generating the cache several times:

Intersection between old and new cache key sets

Intersection between old and new cache key sets

Everything looks beautiful in the picture, all that remains is to implement it. Let's finish our CacheKeysHandlerService function to remove unused build cache:

internal abstract class CacheKeysHandlerService @Inject constructor(
    gradle: Gradle, // это тоже добавляем
) :
    BuildService<CacheKeysHandlerService.Params>,
    BuildOperationListener,
    AutoCloseable {

    // написанный ранее код...

    private val buildCacheDir: File = checkNotNull(gradle.gradleHomeDir)
        .resolve("caches/build-cache-1")

    override fun close() {
        // написанный ранее код...
        
        // Удаляем все кэш-ключи, не вошедшие в текущий билд
        val unusedCacheKeys = iterateBuildCache { it !in cacheKeys }
        for (key in unusedCacheKeys) {
            check(buildCacheDir.resolve(key).delete()) {
                "Unable to delete cache key file: $key"
            }
        }
    }

    private fun iterateBuildCache(selector: (String) -> Boolean): Array<out File> =
        buildCacheDir
            .listFiles { file -> selector(file.name) }
            .orEmpty()
}

Now we can tweak our GitLab Yaml config a bit. First, add a new config for the cache:

# cache.yml

.pull-push-build-cache:
  - key: cache-build
    policy: pull-push
    unprotect: true
    paths:
      - .gradle/caches/build-cache-1/

Secondly, use this config in the cache generation job:

...

# Эта джоба запускается после вливания МРа
cache build:
  stage: post-check
  extends: .base
  script:
    - ./gradlew :app:assembleDebug
  cache:
    - !reference [ .pull-push-wrapper-cache ]
    - !reference [ .push-deps-cache ]
    - !reference [ .pull-push-build-cache ]
  rules:
    - if: $CI_PIPELINE_SOURCE = "push" && $CI_COMMIT_BRANCH == "master"

Now you don't have to worry about updating the cache every time you upload masterbecause it will happen quickly. The more often we update the cache, the higher the cache-hit in our MPs' pipelines. Provided that we do not forget to periodically rebase them and keep them up-to-date with the main branch.

Is it possible to do the same with dependency cache?

Yes, you can, but this is the topic of a separate text card.

There is a life hack called “Thanos Glove”. It works like this: use policy: pull-push for GitLab dependency cache; at the same time, at the start of the job cache build remove 50% of random packets in $GRADLE_USER_HOME/caches/modules-2/. A method that works perfectly in practice.

thanos pepe meme

thanos pepe meme

Reusing MR build cache in MR pipelines

This is already a problem with an asterisk.

You can speed up the running of pipelines in MP even more by reusing the cache generated in the previous run in each new run.
To do this, we must transfer as little data as possible between jobs. Downloading and uploading cache should not take more time than the useful work inside the job.

Branch specific cache keys set

Branch specific cache keys set

Roughly speaking, after each run of the job on the MP we must:

  1. Delete all cache keys that EAT in the master branch. Because next time we'll just pull them in from master again.

  2. Delete all cache keys that were found NOT INVOLVED in the current build. Because if we haven't used them, we probably won't need them next time.

The following diagram is obtained:

After such cleaning in the directory caches/build-cache-1 a real nightmare began what will remain is a distillate that will be light enough to ship/download during GitLab runner preparation, but still be sufficient to significantly speed up the next run.

Vacuuming the cache (distillate)

For the above described scheme of cleaning the cache, consisting of two steps, it is necessary to finish the method close() V CacheKeysHandlerService:

internal abstract class CacheKeysHandlerService @Inject constructor(
    gradle: Gradle,
) :
    BuildService<CacheKeysHandlerService.Params>,
    BuildOperationListener,
    AutoCloseable {

    // написанный ранее код...

    override fun close() {
        val cacheKeysFile = parameters.cacheKeysFile.asFile.get()

        // Если файл с ключами не пустой, значит мы специально подсунули его на CI
        // В файле хранится список ключей прилетевших с master ветки, удаляем их
        cacheKeysFile.useLines { snapshotCacheKeys ->
            for (key in snapshotCacheKeys) {
                check(buildCacheDir.resolve(key).delete()) {
                    "Unable to delete cache key file: $key"
                }
            }
        }

        // Записываем в файл новые ключики, дальше на CI разберутся что с ними делать
        cacheKeysFile.bufferedWriter().use { writer ->
            for (key in cacheKeys) {
                writer.appendLine(key)
            }
        }

        // Удаляем все кэш-ключи, не вошедшие в текущий билд
        val unusedCacheKeys = iterateBuildCache { it !in cacheKeys }
        for (key in unusedCacheKeys) {
            check(buildCacheDir.resolve(key).delete()) {
                "Unable to delete cache key file: $key"
            }
        }
    }
}

We pass the list of cache keys from master to MPs

Joba, which generates a cache on CI, which is run during a merge into the master, will now upload an archive not only with the contents of the directory $GRADLE_USER_HOME/caches/build-cache-1but also a file cache-keys.txt. Jobs launched in MPs will clean their cache of unnecessary keys based on this file. What remains will be saved for further runs:

Editing the build cache definition in GitLab:

# cache.yml

.pull-build-cache:
  - key: cache-build
    policy: pull
    unprotect: true
    paths:
      - .gradle/caches/build-cache-1/
      - cache-keys.txt  # <-- Добавили

# Удаляем .push-build-cache
# Добавляем вот это:
.pull-push-build-cache:
  - key: cache-build
    policy: pull-push
    unprotect: true
    paths:
      - .gradle/caches/build-cache-1/
      - cache-keys.txt  # <-- Добавили

# Добавляем определение для "branch specific cache"
.pull-push-branch-specific-cache:
  - key: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
    policy: pull-push
    unprotect: true
    paths:
      - .gradle/caches/build-cache-1/
      # А вот тут нет `cache-keys.txt`, это важно

We apply new caches to our jobs:

.base:
  variables:
    GRADLE_USER_HOME: $CI_PROJECT_DIR/.gradle
  # Тут какие-то еще базовые настройки, которые я пропустил
  before_script:
    # Врубаем наш плагин! А то ничо не заработает
    - mkdir -p $GRADLE_USER_HOME
    - echo "com.example.build.cache-keys.enabled=true" >> $GRADLE_USER_HOME/gradle.properties
    - ...
  after_script:
    - ...

# Эта джоба запускается в МРе
build:
  stage: check
  extends: .base
  script:
    - ./gradlew :app:assembleDebug
  cache:
    - !reference [ .pull-wrapper-cache ]
    - !reference [ .pull-deps-cache ]
    - !reference [ .pull-build-cache ]
    - !reference [ .pull-push-branch-specific-cache ]  # <-- Добавили
  rules:
    - if: $CI_PIPELINE_SOURCE = "merge_request_event"

# Эта джоба запускается после вливания МРа
cache build:
  stage: post-check
  extends: .base
  script:
    - ./gradlew :app:assembleDebug
  cache:
    - !reference [ .pull-push-wrapper-cache ]
    - !reference [ .push-deps-cache ]
    - !reference [ .push-push-build-cache ]  # <-- Изменили
  rules:
    - if: $CI_PIPELINE_SOURCE = "push" && $CI_COMMIT_BRANCH == "master"

Let's launch it and make sure everything works.

Now when creating an MPa, the first run of the build job will use the cache from the master branch. The closer your branch is to the master branch, the higher the cache-hit will be.

When rerunning MPa, in addition to the cache from master, the cache from the previous run will also be used.

How to debug this?

You can't take my word for it. If you suddenly have a burning desire not only to copy the code from here, but also to make sure that it works, you can check the plugin's operation locally. For local debugging, I recommend specifying an alternative path for the environment variable $GRADLE_USER_HOME. By default, if the variable is not specified, Gradle stores the data in $USER_HOME/.gradleand when debugging you will drop the build cache of all the projects that you have on your computer.

I also remind you that during debugging you need to disable the configuration cache and before each measurement clean the build directories of the modules through your favorite bash script or through ./gradlew clean.

The preparatory work before measuring the quality of the plugin's work is as follows:

  1. Enable the plugin via gradle.properties:

    com.example.build.cache-keys.enabled=true
    
  2. You delete the file cache-keys.txtif he is.

  3. You are compiling a project with an option --scan.

  4. You delete the file cache-keys.txt.

  5. Clean the build directories.

  6. You delete the file cache-keys.txt.

After these steps, you will have a build cache from which ALL keys are removed, except for those used in the build. We delete the file with the keys so that in each subsequent build our plugin does not drop keys from it, thinking that these are keys from the master branch.

Actually, the measurements:

  1. You are compiling a project with an option --scan. On the cache that remained after cleaning from the previous build.

  2. Open BuildScan and look at Cache Hit. It should be 100% (or more than 95% if there are any problems with cache misses).

Information about the percentage of cache hits can also be collected using a plugin, but this topic has already been covered in Tinkoff's report at Mobius Spring 2024.

Results

What are the downsides of the solution and what can be improved?

Dependency cache clearing not implemented

It is done similarly, but the mechanism for tracking artifacts and cleaning directories with them is more difficult to implement and requires more code. Perhaps I will also write a note about this in the future.

Important: The current solution is already more than sufficient even for medium-sized projects.

Successive gradle runs break logic

I wanted to demonstrate the essence of my solution. Additional cleaning logic is easily completed.

We accumulate a list of all keys in RAM

This can be a problem if there are many keys. But I deliberately took this step because most tasks do not support caching via build cache at all. According to my observations, a list of cache keys on a project of 500 modules can take up to 200 kilobytes of memory, so there is no point in optimizing this point.

It's complicated

And who has it easy these days…

About the good

The solution presented here can be considered one of the easiest ways to significantly reduce build time on CI, using only free tools and without diving into impact analysis. Overall, top for your money.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *