The best data products are born in the fields

Most of our online orders they are assembled from the trading floors of stores, and not from warehouses. This leads to errors between what is displayed on the site and what we can really put together in online order.
because of The high speed of goods turnover in stores and the complexity of stock management systems generate errors that can be detected automatically. Based on our knowledge of systems and using social engineering, we proposed a solution that would automatically find the problematic goods and adjust their stock before publishing on the site.


My name is Marina Calabina, I’m the project manager at Leroy Merlin. She joined the company in 2011. The first five years I opened stores (when I arrived, there were 13, now 107), then I worked in the store as the head of the trading sector and for the past year and a half I have been doing what Data Product helping stores organize operations.


Since I have been working in the company for a long time, my speech is filled with specific terms, which I call “Lerulism”. So that we speak the same language with you, I will cite some of them.

  • Stock – stock of goods in the store.
  • Available for sale stock – the amount of goods free of locks and reserves for the client.
  • Expo – showcase sample.
  • Part Numbers – products.
  • Operational inventory – daily recount of 5 items in each department of each store.

Guaranteed Stock

You may not know, but when you place an order with Leroy Merlin, in 98% of cases it comes to the store and is collected from the sales area.

Imagine a huge 8,000 sq. M. m store, 40 000 items and the task of collecting the order. What can happen to the article numbers of your order that the collector is looking for? The product may already be in the basket of the client who walks around the trading floor, or may even be sold between the moment when you ordered it and the moment the collector went after it. There is a product on the site, but in reality it is either somewhere hidden or not already there, some batteries “attached legs.” There is a reverse situation when the goods are in the store, but on the site somehow reasons not displayed.

because of of this we cannot collect the order, we lose sales – our reputation suffers and dissatisfied customers appear.

In order to deal with various problems, including this one, last year the company launched the Data Accelerator division. His mission is to instill data cultureso that decisions made by the company are data-driven. 126 ideas were announced in the Data Accelerator, 5 of them were selected and one of these ideas is the Guaranteed Stock product that I will talk about.

The essence of the product is that before publishing the stock of goods on the site, we check whether we can collect this article for the client, whether we guarantee it. Most often this is achieved with a slightly smaller amount of stock that we publish on the site.

We had a cool team: Data Scientist, Data Engineer, Data Analysis, Product Owner and Scrum master.

The objectives of our product were:

  • reduce the number of unassembled orders, while not harming the number of orders in principle (so that it does not decrease);
  • keep the turnover in eCom, because we will show less goods on the site.

In general, other things being equal, it’s better to do it.

Bureau of Investigation

When the project started, we went to the shops, to the people who work with it every day: we ourselves went to collect orders. It turned out that our product is so interesting and necessary for stores that we were asked to start not after 3 months, as was planned at the beginning, but twice as fast, that is, after 6 weeks. To put it mildly, this was stressful, but nonetheless …

We gathered hypotheses from experts and went looking for what kind of data sources we have in principle. It was a separate quest. In fact, the “bureau of investigation” showed that we have such products that must have a display case.

For example, a mixer – such products always have a sample in the hall. Moreover, we are not entitled to sell the expo, because it may already be damaged and the warranty does not apply to it. We found such goods that do not have a storefront sample and the available stock for sale is shown 1. But, most likely, this is the same expo that we cannot sell. And the client can order it. This is one of the problems.


The next story is the opposite. We have found that sometimes there are too many display cases for goods. Most likely, either the system crashed, or the human factor intervened. Instead of showing 2500 installation boxes on the site, we can only show 43, because we have a system failure. And we taught our algorithms to find such jambs as well.



Having examined the data, we collected excel, sent to colleagues in stores, and they already with these excel-kami They went and checked: this article should have a display case or not, this article really has such a quantity in the store or not. It was a very cool feedback from our stores, thank you very much, for all the enormous turnover that they have, they took the time to help us validate our hypotheses.

As for the examples, when we found too many showcase samples, in almost 60% of cases we were right in assuming an error. And when we were looking for an insufficient number of expos or their absence, we were right in 81%, which, in in generalVery good performance.

Launch MVP. First step

Since we had to meet the deadline of 6 weeks, we launched a proof of concept with such a linear algorithm that found abnormal values, corrected for these values ​​before publishing to the site. And we had two stores, in two different regions, so that we could compare the effect.
In addition, a dashboard was made, where, on the one hand, we monitored the technical parameters, and on the other, we showed our customers, in fact the stores, how our algorithms work out. That is, we compared how they worked before the launch and how they began to work after, showed how much money the use of these algorithms allows you to earn.

The rule is -1. Second phase

The effect of the product’s work quickly became noticeable, and we were asked why we process such a small number of articles: “Let’s take the entire stock of the store, subtract one piece from each article, and maybe this will allow us to solve the problem globally.” By this moment, we had already begun working on a machine learning model, it seemed to us that such a “carpet bombardment” could do much harm, but we did not want to miss the opportunity of such an experiment. And we ran a test in 4 stores in order to test this hypothesis.

When after a month we looked at the results, we found out two important circumstances. Firstlywhen we subtract one piece, most often it affects expensive articles, any fireplaces, heat guns, which are few. So they could not be sold on the site, because by this algorithm we could completely hide their stock. SecondlyIt turned out that this does not affect products with medium and large stocks. Thus, this approach did not justify itself, and we proceeded to implement the machine learning model.

ML model. Third stage

So we done ML model, launched it in prod in 6 stores. What we got ML model?

  • The model is implemented using gradient boosting on Catboost, and this gives a prediction of the likelihood that the stock of goods in this store is currently incorrect.
  • The model was trained on the results of operational and annual inventories, including data on canceled orders.
  • As indirect indications of the possibility of an incorrect flow, we used such signs as data on the latest movements in the stock of this product, on sales, returns and orders, on stock available for sale, on the nomenclature, on some characteristics of the product and so on.
  • In total, about 70 features are used in the model.
  • Among all the attributes, important ones were selected using various approaches to assessing importance, including Permutation Importance and approaches implemented in the library Catboost.
  • To check the quality and select model hyperparameters, the data were divided into test and validation samples in a ratio of 80/20.
  • The model was trained on older data, and tested on newer ones.
  • The final model, which eventually went to the prod, was trained on a full dataset using hyperparameters selected using the split on train /valid parts.
  • Model and model training data are versioned using DVC, model and dataset versions are stored on S3.

Final metrics of the obtained model on the validation data set:

  • ROC-AUC: 0.68
  • Recall: 0.77


A little about architecture – how it is implemented in the prod. To train the model, we use replicas of the company’s operating and product systems, consolidated in a single DataLake on the GreenPlum platform. Based on the replicas, features stored in MongoDB are calculated, which allows you to organize hot access to them. Feature calculation orchestration and integration of GreenPlum and MongoDB implemented using opensource stackApache tools Apache AirFlow and Apache NiFi.

The machine learning model is a containerized Python applicationdeployed to Kubernetes. The application works on the basis of information about the current state of runoff coming from various business systems to the distributed Apache Kafka message broker, the data from which the model reads, adjusts and sends to the company’s website using the Apache Kafka-based bus.



We had 6 stores and the results showed that out of the planned 15%, we were able to reduce the number of unassembled orders by 12%, while our turnover increased E-com and the number of orders. So, we didn’t do much harm, but just improved the quality of order assembly.

At the moment, the model we trained is used not only to edit the flow before publishing on the site, but also to improve operational inventory algorithms. What articles you need to calculate today in this department, in this store – those that customers will come for and which would be good to check. In general, the model also turned out to be multifunctional and is reused by the company in other departments.

ps The article was written on a speech at the Avito.Tech meeting, you can watch the video link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *