Bazel, stamping, remote cache (part 2)

Bazel has two extremely useful features:

  • stamping – allows you to embed data into the artifact about which commit you can build a similar artifact from;

  • remote cache And remote build – allows you to have a shared cache between collectors or even collect artifacts on a farm.

Previously, unfortunately, these features were mutually exclusive, but since Bazel 7.0 you can use stamping with remote cache using scrubbing. Bazel 7.1 was released today, adding the ability to use stamping with remote build.

I wrote more about this problem earlier in the article Bazel, stamping, remote cache.

What is stamping?

Stamping allows you to add information to an artifact about which version a similar artifact can be assembled from.

Not the version for which the build was launched, but the one from which you can get an analogue.

For example, there are two commits that differ only in the README file. Then the executable file compiled from these commits can contain the same commit as information about which revision it can be compiled from, since changes between these revisions do not affect it in any way.

This allows, on the one hand, to have information about which revision you can build an equivalent artifact from, and on the other hand, not to rebuild it for each commit.

How does stamping work?

Inside, stamping is implemented simply: files transferred to embed the version into the artifact are excluded (in the case of Bazel: bazel-out/volatile-status.txt) from the caching key.

Thus, the artifact is rebuilt only if at least some of the input parameters have changed, except for the file with the data for the version.

What's the problem with remote cache?

Bazel has several caches. Bazel's internal cache and remote cache have different caching keys. Bazel for disk cache/remote cache/remote build use the same cache key (disk cache is a special case of remote cache).

The problem is that the action caching key for a farm build or remote cache is the hash of the build task. This hash is affected by all input data and is not affected by the semantics of stamping files. That is, the files for stamping affect the hash of the task for the build.

Thus, we end up with a situation where any assembly always receives a unique version information data file and is never cached.

The most unpleasant thing is that even marking a rule with stamping for a local assembly through tags does not correct the situation – we will receive the same artifact only if we get into the cache with a previous assembly on the same collector.

What is scrubbing?

Bazel 7.0 introduced scrubbing. It allows you to influence the caching key for remote cache.

Eg:

  • add salt when hashing;

  • replace assembly arguments;

  • exclude input files from the caching key.

In case of stamping, you can exclude the file from the caching key bazel-out/volatile-status.txt and we will get the same behavior when using remote cache as when building locally.

In addition, scrubbing allows you to solve the problem when you need to use some derivative of bazel-out/volatile-status.txt to embed version information.

Example of using scrubbing

To use scrubbing you need to create a scrubbing configuration file, for example:

rules {
  matcher {
    kind: "stamping"
    mnemonic: "Example"
  }
  transform {
    omitted_inputs: "^bazel-out/volatile-status\\\\.txt$"
  }
}

The list of allowed fields can be found here: https://github.com/bazelbuild/bazel/blob/master/src/main/protobuf/remote_scrubbing.proto

The transformation of the first rule that fits the stated criteria is applied to the assembly action.

In order for the scrubbing configuration to be used during assembly, it must be passed as a parameter --experimental_remote_scrubbing_config.

What is the problem with scrubbing and remote cache?

In Bazel 7.0, when you try to use the parameter --experimental_remote_scrubbing_config with a remote assembly, we will receive an error: Cannot combine remote cache key scrubbing with remote execution

Fortunately, the behavior has changed in Bazel 7.1 (https://github.com/bazelbuild/bazel/pull/21384): instead of a global error, the actions that are subject to scrubbing occur on the local host.

This allows you to use stamping and assembly on the farm, but you need to be very careful in choosing matcher-s for rules:

  • it is necessary that everything that uses stamping falls under them, since otherwise there will be a constant miss by the cache (regardless of whether this transformation changes the value of the caching key);

  • it is necessary that unnecessary things do not fall under them, since it will no longer be collected on the farm and will be collected locally (but the remote cache will be used).

Summarize

With Bazel 7.1 it is finally possible to use stamping and remote assembly, although not without problems.

I hope that after some time the transformation interface for scrubbing will be fixed and its support will appear in the protocol for remote assembly. This should remove restrictions on local execution and allow all build tasks to be performed on the farm.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *