Why X5 Group has singled out Data Engineering into a separate center of excellence

and how it helped speed up product development

When X5 Group began to develop BigData, in addition to the DMP platform itself and BI analytics, the company began to actively launch digital products based on big data, using complex analytics and machine learning. For example, we can cite products for forecasting demand, managing the assortment matrix of stores, predicting the lack of goods on the shelves, dynamic pricing, etc.

When and why a product team needs a Data Engineering competency

Most of the products were innovative and required active R&D, in other words, many different hypotheses and approaches had to be tested before an optimal solution was found. One of the key factors in a successful R&D process is how quickly you can experiment and test hypotheses. In the X5 Group, to develop products, they formed autonomous and maximally independent, cross-functional teams that would have a minimum of external dependencies and could move forward with maximum speed. At the same time, product teams often had two types of data-related tasks:

find among the many data sources and many views exactly the data that the product needs and make sure it is correct
build a complex pipeline for extracting data from different storefronts and sources, their non-trivial processing, often using ML and providing the results to the product team for further work

To effectively solve the first type of tasks, X5 Group develops the Data Quality competence, and for the second – the Data Engineer competence. In short, the difference between these two competencies can be reduced to the following:

DQ is much deeper into the domain and business than programming and distributed systems, while DE is much deeper into programming and the nuances of distributed processing and storage systems than into the domain and business.
DQ searches the company for data in various systems, understands which of them are needed for a specific task, can make requirements for their processing and cleaning, and also assemble a simple data mart or a prototype of such a mart. DE develops productive versions of storefronts according to the requirements received from DQ, sets up infrastructure, monitoring and logging, and optimizes calculations.

The appearance in a cross-functional team that builds a product based on big data, the Data Quality and / or Data Engineer roles, has significantly accelerated the product development process at the initial stage, when the core of the team may consist of Product Owner, Data Scientist / Data Analyst, which Paired with Data Quality / Data Engineer, I can quickly and efficiently find and prepare data on the basis of which machine learning models are learned and hypotheses are tested.

After developing an MVP and conducting a successful pilot, when the product receives a conclusion about the presence of a confirmed economic effect, the team is already faced with tasks of a different kind. As a rule, there is a need for serious product scaling, for example:

start making a forecast not for 100 stores, as on the pilot, but for more than 17 stores throughout Russia.
start adhering to strict SLA for settlement time and response rate
avoid falling calculations and inaccuracies / inconsistencies in data marts

At this stage, it is critical for the product team to build an engineering solution with the required levels of resiliency and scalability. And the attraction of DE competence for the development of data marts and pipelines is no longer just desirable, but simply vital for the further development of the product.

Why we have singled out and develop Data Engineering separately from other areas

Initially at X5, the Data Engineering competence was mainly concentrated in the division responsible for the design and development of the big data platform and unified enterprise warehouse. But DMP and EDW had their own development roadmap and product teams with their own tasks were forced to adjust to this roadmap, which could significantly slow down their development.

This led to the fact that backend developers could start solving big data processing tasks in product teams, mastering a new technology stack, or the guys from Data Science, who had a good engineering background, began to take on more tasks for preparing and processing data. building complex showcases.

Data Engineers hired into product teams also settled wherever they needed: in the DMP development team, in the divisions responsible for classic Software Engineering, or among Data Scientists and Data Analyst. This led to the following problems:

product managers did not know exactly where they needed to go for the DE resource, and who in the company was responsible for hiring and developing such specialists
DE hiring was often carried out by people competent in related areas, but not particularly versed in Data Engineering
there was no single environment for communication, mentoring and development for DEs, scattered across different departments, which complicated the dissemination of best practices and standardization of approaches to development and the tools used. This led to the fact that teams were constantly “reinventing the wheel”, not knowing that their problem had long been successfully solved elsewhere.

The separation of DE as a separate center of competence in the X5 Group ultimately made it possible to make significant progress in solving the listed problems:

make a single entry point for product managers, as well as faster and more flexible allocation of resources to product teams by creating a single resource pool
simplify hiring and make it more systematic due to the fact that hiring is done centrally by people who are knowledgeable in Data Engineering
start to implement typical solutions in product teams and standardize approaches to development and tools, build an effective environment for communication, mentoring and development of engineers
accelerate the launch of new products based on big data and allow successfully piloted products to develop into reliable engineering solutions scalable to more than 17,900 X5 Group stores in 66 regions of Russia

Here are a couple of cases that we have already implemented.

Case 1 – scaling a successful pilot

The team has developed an MVP of the product, which, based on the analysis of terabytes of receipt data, predicts the absence of goods on store shelves (although they are in stock) and sends alerts to store personnel to check the availability of goods on the shelves and, if necessary, replenish them. A pilot was conducted on 600 stores, which showed a confirmed economic effect, and the team faced the task of scaling their solution to more than 16 thousand stores.

It was required to analyze all available calculations for their possible scaling and optimization: calculations and data preparation should fit into strictly fixed time frames from the moment the updated data on receipts for the last day was uploaded to the cluster until the opening of the first stores the next morning.

The presence of specialists with Data Engineering competence in the team allowed us to identify bottlenecks in the product pipelines, test them at 17 thousand stores, study request plans and identify bottlenecks, analyze resource utilization when they are executed, understand which resource configuration will be most appropriate and make an estimate of the amount of additional the hardware you need to scale.

Case 2 – stabilization of the pipeline under increased load

One of the products implemented a data preparation pipeline for post-analysis of promotions. The team developed a pipeline in the form of many scripts on Hive and successfully brought it into production.

Over time, the work of the pipeline became unstable: the execution time could increase significantly, sometimes the pipeline generally crashed with errors. To diagnose problems and develop possible solutions, it was required to connect a Data Engineer, who conducted an in-depth analysis of the plans for executing queries and utilizing cluster resources. As a result, some of the queries were optimized, for another part, more efficient implementations of transformations on Spark were developed and tested, which made it possible to stabilize the pipeline in general and continue its industrial operation.

→ Data Scientist and Data Engineer Jobs at X5