How Yandex's "Banner Rotator" Makes 700,000 RPS and Selects Ads for You

Why does a barista need a dredger, and an electrician a percussion massager? How is it that as soon as you think about a vacation, all the banners offer trips to Dagestan? And why, after a single request for turn signals for BMW, ads for the purchase of used spare parts continue to spin for another month? The “Banner Rotator” is responsible for all this. The service processes 99% of requests in just 200 milliseconds, uses ML and seriously saves the company's resources. And here's how it all works.

Filtering Ads

After selection, relevant advertisements are sent to the “Banner Rotator”. At the first stage, they all undergo filtering. For example, the advertiser has limited the possibility of displaying on certain sites or the advertisement itself is marked 18+.

Document processing

And then the most interesting part begins — document processing. Here the system can be compared to a sculptor who was brought a huge block of marble. It is necessary to cut off all the unnecessary. And while the block (our array of ads) is large, you can act with the simplest and “crudest” tools.

Clustering and “lazy materialization”

All objects in the array are divided into clusters. The hierarchy is as follows: client → campaign → group → banner. At this stage, the spinner works with top-level elements. This way, you can immediately cut off unsuitable clusters, and this is a large amount of data.

Then the “Banner Spinner” goes down the hierarchy, removing more and more unsuitable ads from the array until the most relevant ones remain. The so-called “lazy materialization of objects” is used here. That is, all ads are ranked by some value and the system moves along the resulting list in descending order of priority. Thus, less suitable documents do not end up in the final selection, with which the “heavier” models work. And this also saves resources.

It is not necessary to interact with the entire object/ad. It is enough to look at its attributes. For example, if there are age or domain restrictions, we simply compare this attribute with the white/black list.

AI at last

And now, when we have a ready list of the most relevant ads, which is obviously much smaller than the initial one, we can launch ML. The upper part of the neural network processes the ads, and the lower part processes the potential candidate's requests. And only then are the obtained results linked as a scalar product. This is how the target is trained.

Then the system makes the top and removes candidates from the bottom. And here you can already use heavy models that make inference of candidates. This is how the most relevant ads are selected according to the user's interests.

How the model is accelerated

To speed up the work of the “Banner Spinner” and free up resources, sharding is used. Ads are placed into shards according to the principle shard=Id% NumberOfShards. That is, almost randomly. This way, sharding can be made more uniform.

Yandex developers are building a microservice architecture. And to transfer information from one service to another, they use the Protobuf and Flatbuffers protocols. This reduces the time for serialization and deserialization.

The text is based on a report by Artem Vanshulin, head of ranking development in the Yandex banner system team.