The best data products are born in the fields
Most of our
My name is Marina Calabina, I’m the project manager at Leroy Merlin. She joined the company in 2011. The first five years I opened stores (when I arrived, there were 13, now 107), then I worked in the store as the head of the trading sector and for the past year and a half I have been doing what
Lerulism
Since I have been working in the company for a long time, my speech is filled with specific terms, which I call “Lerulism”. So that we speak the same language with you, I will cite some of them.
- Stock – stock of goods in the store.
- Available for sale stock – the amount of goods free of locks and reserves for the client.
- Expo – showcase sample.
- Part Numbers – products.
- Operational inventory – daily recount of 5 items in each department of each store.
Guaranteed Stock
You may not know, but when you place an order with Leroy Merlin, in 98% of cases it comes to the store and is collected from the sales area.
Imagine a huge 8,000 sq. M. m store, 40 000 items and the task of collecting the order. What can happen to the article numbers of your order that the collector is looking for? The product may already be in the basket of the client who walks around the trading floor, or may even be sold between the moment when you ordered it and the moment the collector went after it. There is a product on the site, but in reality it is either
In order to deal with various problems, including this one, last year the company launched the Data Accelerator division. His mission is to instill
The essence of the product is that before publishing the stock of goods on the site, we check whether we can collect this article for the client, whether we guarantee it. Most often this is achieved with a slightly smaller amount of stock that we publish on the site.
We had a cool team: Data Scientist, Data Engineer, Data Analysis, Product Owner and
The objectives of our product were:
- reduce the number of unassembled orders, while not harming the number of orders in principle (so that it does not decrease);
- keep the turnover in eCom, because we will show less goods on the site.
In general, other things being equal, it’s better to do it.
Bureau of Investigation
When the project started, we went to the shops, to the people who work with it every day: we ourselves went to collect orders. It turned out that our product is so interesting and necessary for stores that we were asked to start not after 3 months, as was planned at the beginning, but twice as fast, that is, after 6 weeks. To put it mildly, this was stressful, but nonetheless …
We gathered hypotheses from experts and went looking for what kind of data sources we have in principle. It was a separate quest. In fact, the “bureau of investigation” showed that we have such products that must have a display case.
For example, a mixer – such products always have a sample in the hall. Moreover, we are not entitled to sell the expo, because it may already be damaged and the warranty does not apply to it. We found such goods that do not have a storefront sample and the available stock for sale is shown 1. But, most likely, this is the same expo that we cannot sell. And the client can order it. This is one of the problems.
The next story is the opposite. We have found that sometimes there are too many display cases for goods. Most likely, either the system crashed, or the human factor intervened. Instead of showing 2500 installation boxes on the site, we can only show 43, because we have a system failure. And we taught our algorithms to find such jambs as well.
Validation
Having examined the data, we collected
As for the examples, when we found too many showcase samples, in almost 60% of cases we were right in assuming an error. And when we were looking for an insufficient number of expos or their absence, we were right in 81%, which, in
Launch MVP. First step
Since we had to meet the deadline of 6 weeks, we launched a proof of concept with such a linear algorithm that found abnormal values, corrected for these values before publishing to the site. And we had two stores, in two different regions, so that we could compare the effect.
In addition, a dashboard was made, where, on the one hand, we monitored the technical parameters, and on the other, we showed our customers, in fact the stores, how our algorithms work out. That is, we compared how they worked before the launch and how they began to work after, showed how much money the use of these algorithms allows you to earn.
The rule is -1. Second phase
The effect of the product’s work quickly became noticeable, and we were asked why we process such a small number of articles: “Let’s take the entire stock of the store, subtract one piece from each article, and maybe this will allow us to solve the problem globally.” By this moment, we had already begun working on a machine learning model, it seemed to us that such a “carpet bombardment” could do much harm, but we did not want to miss the opportunity of such an experiment. And we ran a test in 4 stores in order to test this hypothesis.
When after a month we looked at the results, we found out two important circumstances.
ML model . Third stage
So we done
- The model is implemented using gradient boosting on Catboost, and this gives a prediction of the likelihood that the stock of goods in this store is currently incorrect.
- The model was trained on the results of operational and annual inventories, including data on canceled orders.
- As indirect indications of the possibility of an incorrect flow, we used such signs as data on the latest movements in the stock of this product, on sales, returns and orders, on stock available for sale, on the nomenclature, on some characteristics of the product and so on.
- In total, about 70 features are used in the model.
- Among all the attributes, important ones were selected using various approaches to assessing importance, including Permutation Importance and approaches implemented in the library Catboost.
- To check the quality and select model hyperparameters, the data were divided into test and validation samples in a ratio of 80/20.
- The model was trained on older data, and tested on newer ones.
- The final model, which eventually went to the prod, was trained on a full dataset using hyperparameters selected using the split on train /
valid parts . - Model and model training data are versioned using DVC, model and dataset versions are stored on S3.
Final metrics of the obtained model on the validation data set:
ROC-AUC : 0.68- Recall: 0.77
Architecture
A little about architecture – how it is implemented in the prod. To train the model, we use replicas of the company’s operating and product systems, consolidated in a single DataLake on the GreenPlum platform. Based on the replicas, features stored in MongoDB are calculated, which allows you to organize hot access to them. Feature calculation orchestration and integration of GreenPlum and MongoDB implemented using
The machine learning model is a containerized
results
We had 6 stores and the results showed that out of the planned 15%, we were able to reduce the number of unassembled orders by 12%, while our turnover increased
At the moment, the model we trained is used not only to edit the flow before publishing on the site, but also to improve operational inventory algorithms. What articles you need to calculate today in this department, in this store – those that customers will come for and which would be good to check. In general, the model also turned out to be multifunctional and is reused by the company in other departments.
ps The article was written on a speech at the Avito.Tech meeting, you can watch the video link.