Meet the Unfamiliar. How We Made a New Mode for My Wave

Hi! My name is Savva Stepurin, I am a senior developer in the recommendation products group at Yandex Fantech. Today I will tell you about how we made “Unfamiliar” for My Wave — a special mode for active search of musical discoveries.

“Unfamiliar” allows you to receive from My Wave those tracks that you have not listened to yet (maybe you don't even know about their existence), but which are very likely to fall into your musical preferences. If My Wave in its pure form is an ideal balance between your favorite compositions and something new, then “Unfamiliar” helps you get out of the musical information bubble and listen to new tracks.

Below is the technical evolution of “Unknown” from a filter to a separate product, a description of the new ranking model, and much more.

The Making of the Stranger

What is this?

The “Unfamiliar” mode is a recommendation of music unknown to the user. Although these are tracks that the user has never heard, they are selected in the same way as in the regular My Wave: personalized, taking into account individual preferences. It is designed for active search of new artists, tracks and musical genres. Its main task is to broaden horizons, help to get out of the musical bubble and find something completely new that will go straight to the heart.

Where we started

The “Unfamiliar” mode has existed since the launch of My Wave in November 2021. At first, “Unfamiliar” was a simple filter for My Wave, which without additional settings was a balanced recommendation stream. It turned out to be as universal as possible – according to your tastes for all occasions. The generated stream included tracks that you had listened to for a long time, tracks that you liked, and those that you listened to more than once, but did not like.

Along with this, new tracks were mixed into the stream – a new song by a familiar artist or something from an unfamiliar one. Thus, the recommendations are conditionally divided into two parts: exploitation – we use already familiar tracks, and exploration – we explore something new. To balance the “boldness” of the selection between them, we have special mechanisms. In other words, the balance is selected so that you do not come across completely unfamiliar and unusual music, but at the same time it does not make the stream too boring and predictable.

So when the user wanted to turn on “Unfamiliar”, this is what happened:

The basic flow of My Wave was taken, the one that is the most universal and everyone likes.
A filter was placed on top of it, which threw out from the stream what the user had already heard before (for example, a specific track listened to or an artist who had been listened to more than 5 times).

As a result, after the filter was applied, only completely unfamiliar artists and fundamentally unfamiliar tracks got into the stream. And everything would have been fine, but it was a bit boring purely from an emotional point of view. This happened because we took the original My Wave and simply filtered it, discarding the familiar. The stream turned out to be formally unfamiliar, but still quite conservative, since it was inherited from My Wave.

Developing the idea

We decided to conceptually come to this decision: to make the “Unfamiliar” setting bolder and to make it a separate independent entity, and not just an additional filter. Giving it unique properties and its own character is more risky.

We started training a separate ranking model (at Yandex we call it formula), which is only activated if the user makes it clear that they want to listen to “Unfamiliar”. The formula's priority was to rank the cases when it comes to unfamiliar music first. That is, tracks unfamiliar to the user, including unfamiliar artists or those they have heard a couple of times. But the formula was allowed to rank familiar music worse, paying almost no attention to it. The main thing is to guess the tracks well when the “Unfamiliar” mode is turned on.

In fact, the new formula learns when working with familiar music, but we raise the weights more on training examples in which the user turned on “Unfamiliar”. For CatBoost, which we use, these examples are crucial. But other user streams that the formula sees (without the “Unfamiliar” setting) have noticeably lower weights.

But even this was not enough to create a full-fledged entity from the “Stranger”.

To make our next steps clear, we need to first talk about the training pool. In this case, we are trying to determine which of the two tracks a specific user likes more, so the formula learns from pairs of events. We take consecutive pairs of different user events: Play, Skip, Like, Dislike. It is important that the events must be different, i.e. a Play-Play pair will not be included in the sample.

So the formula receives pairs of the following type as input: Play-Skip, Skip-Play, Dislike-Like, and so on.

The task of the formula is to guess which of the events is better. For example, in the Play-Skip pair, we have Play better than Skip. And in the Like-Play pair, Like is better than Play.

What exactly is better, we describe using special scales that set the value system for the formula. We can always tell it that dislikes are very bad and twist their weight. If we do this, the formula will turn into a very cautious one and will avoid dislikes at any cost, throwing to the user only what will not cause irritation. As a result, most likely, the recommendations will be boring.

Now we have re-selected the weights separately for the “Unfamiliar” setting, telling the formula that we will reward it for likes. At the same time, we assessed skips and dislikes as quite acceptable events, and not something terrible. Thanks to this prioritization, the formula became bolder, and we managed to increase the number of likes, even sometimes going in exchange for skips or dislikes.

Why exactly in this case can the formula be made so risky?

Because we always assume that the launch of “Unfamiliar” is a conscious choice of the user, ready for new musical discoveries. If we had issued such a formula for the usual My Wave (when a person also wants to listen to his favorite and familiar music), then users would clearly be saddened. But here we work with unfamiliar content, increase the number of likes thanks to the boldness of the formula and at the same time do not lose any metrics related to the duration of listening to the “Unfamiliar” setting. Precisely because the user is ready for such experimental audacity of the system.

Solution architecture

Our recommendation process has a pipeline that is restarted with each new track. For example, you listened to a track and gave feedback on it (like, dislike, skip). This is immediately uploaded to the recommendations, and the pipeline is rebuilt. The exact frequency of rebuilding depends on the client, but not less than once every 5 tracks. This is generally not the most common story among recommendation systems – to react to user actions almost on the fly and adapt to them, but we have learned to work exactly like this.

The pipeline itself consists of several stages.

Candidate generation

There are tens of millions of tracks in our catalog. From all this quantity, we need to select the so-called candidates — tracks that may be at least a little relevant to the user. For this, we have a set of selectors.

Each selector is essentially a mini-search that solves a specific problem. From tens of millions of tracks, selectors select several hundred thousand candidates. For example, one selector selects tracks of artists you liked. Another selector carefully selects tracks that are similar to the last five you listened to. And it doesn’t matter whether these tracks are familiar to you or not, the main thing is that they are similar to the last five tracks.

And we have several such selectors, each of which has its own task.

Filtration

So, after the first stage, we have a lot of candidates. It's time for filtering, during which we throw out from this list what we definitely don't want to put in the user's stream. For example, we throw out tracks of artists that were played recently (less than 25 tracks ago).

At this stage, we have the “Unfamiliar” filter working, which discards familiar tracks from the list.

Ranging

After filtering, we are left with a set of tracks, which we rank using a formula. Thanks to CatBoost, we sort these tens of thousands of tracks, rank them, and then put the best ones in the user's stream.

In a very simplified form, this can be represented in the following diagram:

Where would we be without metrics?

Good recommendations are a very multi-factorial task. There are many aspects of quality, and some of them may even be orthogonal to each other.

Well, let's say there is accuracy — this aspect simply evaluates whether the track suits you or not. If you are a fan of hard rock, then most likely Kirkorov's songs will not suit you very well. And if the recommendations, knowing your tastes, throw Kirkorov at you, then these are so-so recommendations — they do not understand what you like.

It's a bit like search results. You have a specific search query, you type it into a search engine, you get a result. Then you evaluate whether it's relevant to you or not – it's a pretty binary evaluation. You just take the number of answers, evaluate the relevance of each of them, and derive an evaluation of the overall results for yourself.

But this is not enough to create a recommendation system, because a stream built simply on the songs that suit you can be truly disgusting. We tested this even in the sense that we simply took a set of the same tracks and changed their order in the stream. Even using only the difference in order, you can assemble from the same set both an excellent stream that will be listened to and a boring wave that will quickly become boring. This is something that is partly tied to the user's momentary mood and emotions. Therefore, it is very interesting to work with.

In general, as you understand, good recommendations cannot be built on accuracy alone.

An additional aspect is diversity. How often we repeat a song. If you like a track and it suits your tastes, you may still get tired of it if it plays too often in the stream.

Another great aspect is serendipitydiscovery of something new. This is the case when a new, unfamiliar song flies into your stream, causing a wow effect and becoming almost a new favorite track for some time. Moreover, you have not heard it before, do not know the artist – it was simply thrown in by recommendations.

I have identified three rather important aspects of good recommendations, but in reality there are many of them, as well as internal metrics that evaluate each aspect. The main thing to understand here is that the task of recommendation quality is multifactorial and all metrics are combined into a clear hierarchy.

There are high-level metrics, such as user return rate. Many lower-level metrics are subordinate to it. The most interesting ones are Like prob and Play prob. Play prob – this is the probability that you will listen to the track we put on until the end. Like prob — the probability that the track will get a like from you.

Together, these metrics act as communicating vessels, we can exchange them for each other. If we put more unfamiliar things, then the likes may increase. Including because you can’t like a track twice. In addition, it is not very likely that you will like a track that you have known for a long time, listen to often, but it has not yet earned a like from you. Therefore, the fewer familiar tracks, the higher the probability of getting a like on a track. That is, the Like prob will grow, but the Play prob may sag, since some unfamiliar tracks will simply be skipped without listening to them to the end.

You can tip the scales the other way – leave the listener only what he definitely loves. For example, you love Queen, you've been listening to them for a long time, and if you come across the band in a stream, you'll probably listen to it to the end, increasing the Play prob. But with Like prob, everything will be sadder, because you either already have a like on these songs (and can't put a new one), or you haven't put one in all this time, and most likely, you won't put one again.

Right now, the most interesting (and difficult) task for us is not just to exchange one metric for another, but to do it in such a way as to grow both of them, without any drawdowns.

This is the main indicator that the recommendations are working as they should. In the case of the “Unfamiliar” setting, although we sacrificed a small drop in Play prob, we got a very noticeable increase in Like prob and generally improved the user return metric to this mode.

Now, with the help of the “Unknown” mode, users have started adding tracks of previously unknown artists to their media library 20% more often. If you want to try it, you can do it in the My Wave settings.