How We Upgraded Cooper's Search Engine with FastText and XGBRanker

Search types in Cooper

We have two types of search.

  • The first one is inside the store.. The user selects one retailer (for example, METRO) and collects the basket there. This search is useful if the user prefers to order from a specific retailer.

  • The second type is inter-retailer searchwhich allows you to search for products among all stores that are presented in Cooper. This search is useful if the items that the application users need are presented in different retailers and they make a decision about choosing a store to order from based on other parameters – for example, delivery time.

Two types of search in the interface of our application

Two types of search in the interface of our application

Searching among all stores works like this:

  1. The user enters a query.

  2. We determine all available stores based on the user's geolocation.

  3. In each of the available stores, we search for products according to the user's request.

  4. We rank the stores where we found something and collect search results for products for each store.

Let's say I'm making breakfast and I realize I forgot to buy an avocado. I type “Avocado” into Cooper's search engine and this page appears:

First come the stores that have avocados (METRO, Lenta, Globus), then come the pick-up from the selected store.

But how does the application determine that for this particular query it is METRO that needs to be shown first, and not “Lenta”? What are the principles and technology of ranking? I will give all the answers below!

A long time ago in a galaxy far, far away…

…Cooper didn't have smart rankings.

Regardless of whether the user's preferences matched, the products of the retailer with the highest revenue (if there were matches) were displayed on the screen. Most often, this was METRO.

But Cooper offers a wide variety of products: food, electronics, jewelry, medicine, and what else!

The application has more than 20 categories

The application has more than 20 categories

If a user wanted to buy, say, a new smartphone and entered “Apple” into the search engine, the first thing they might come across would be bourbon from METRO:

What was wrong with that?

  • First, the user had to scroll through the list of stores until the relevant product appeared on the screen. This could be annoying and reduce the level of satisfaction from working with the application.

  • And secondly, the user might simply be confused and think that he is being shown irrelevant products simply because the ones he needs are not available.

Both situations lead to the user visiting the application less often or even ceasing to be our client.

The first approach to ranked search

When we first sat down to work on the task, we decided to split the ranking process into two stages.

  1. In the first step, we determined whether the search query was related to restaurants. In Cooper, restaurants represent about a quarter of all retailers. In Moscow, users can have simultaneous access to about 500 items!

    The classifier was developed based on the model fastTextIt contains trained vector representations of words, that is, meanings that are embedded in a form that a computer can understand.

    We trained a binary classification model where the target variable was adding a product from either a restaurant or a store. For example, for the query “khinkali”, the product can be added from both types of retailers, but we most often order khinkali from restaurants, and therefore the model tends to classify this query as related to restaurants. And so with many dishes.

  2. In the second step, we ranked restaurants relative to each other and stores relative to each other based on static data: how many times was a product for a given search query added from a given restaurant or store?

When we conducted A/B testing, we realized that search with ranking was perceived by users much better. This motivated us to continue working and take into account additional factors when ranking.

Improved model

In the second approach, we kept the fastText-based classifier but replaced the ranking based on statistical data with ranking based on model predictions. XGBRanker.

We began to take into account:

  • statistics on additions in various sections: in general by request, by request and region, by request and a specific user;

  • cost and speed of delivery;

  • retailer popularity;

  • prices of goods in the retailer.

To help the XGBRanker model understand how relevant a retailer is to a user, it was trained using the following target values:

  1. The user selected a store, added a product from it and made a purchase.

  2. The user selected a store, added a product, but then deleted it.

  3. The user selected a store, looked at the results or some product, but did not perform any further actions;

  4. The store was in the search results, but the user ignored it.

The chances that the user will quickly get to the desired product have become much higher. The relevance of the results in the inter-retail search (NDCG) has increased by 5 percentage points.

Even results for categories not related to food have improved. For example, the search query “headphones” now shows specialized stores like “Technopark” first, instead of grocery stores, which may also sell inexpensive headphones.

Impact on business

After the second approach, we ran A/B testing again, and it again showed that the new solution was better 🙂 All conversion metrics and the average number of products added from inter-retail search increased. Revenue increased statistically significantly – it's a shame I can't share specific numbers! And the share of empty results and the frequency of using inter-retail search decreased. The latter indicates that users began to find the products they need faster and, therefore, return to the search less often.

A two-stage model, where the first stage uses a classifier, and the second stage uses the XGBRanker model, gave a significant increase in the ranking metrics for stores and restaurants.

What else we tried + our plans

I would like to separately mention the search by names of shops and restaurants.

As I wrote above, our application features hundreds of restaurants. Previously, the search returned the desired restaurant only if the request completely matched the official name: for the request “Karavaev Brothers' Culinary Shop” there was a result, but for the incomplete request “Karavaev Brothers” – no longer.

We have implemented a full-text search by names and their synonyms: the application makes a request to Elasticsearch and boosts to the top those stores and restaurants with names and synonyms that have intersections found.

However, this feature remains in development because A/B testing results have been mixed.

We are also going to expand the classifier by adding a division not only into restaurants and stores, but also into stores of different types (grocery, electronics, pharmacy, etc.).

I'll be glad if my article is interesting and useful for my colleagues. How else do you think the user experience in Cooper can be improved? Share your ideas in the comments 🙂

Cooper's (ex SberMarket) tech team runs social networks with news and announcements. If you want to know what's under the hood of a high-load e-commerce, follow us on Telegram and on YouTube. And also listen podcast “For tech and these” from our IT managers.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *