Data Mesh: how to work with data without a monolith
Hello, Habr! We at Dodo Pizza Engineering really love the data (and who doesn’t like it now?). Now there will be a story about how to accumulate all the data of the Dodo Pizza world and give any company employee convenient access to this data array. The task under the asterisk: to save the nerves of the Data Engineering team.
Like real Plyushkins, we save all kinds of information about the work of our pizzerias:
- remember all user orders;
- we know how much time it took to make the very first pizza in Syktyvkar;
- see how long the pizza cools on the heat shelf in Voronezh right now;
- We store data on writing off products;
- and many many others.
Several teams are currently responsible for working with data at Dodo Pizza, one of them is the Data Engineering team. Now they (that is, us) have a task: to give any employee of the company convenient access to this data array.
When we began to think about how to do this and started discussing the task, we found a very interesting approach to data management – Data Mesh (you will find a huge chic article here). Her ideas fell very well on our idea of how we want to build our system. The rest of the article will be our rethinking of the approach and how we see it being implemented in Dodo Pizza Engineering.
What do we mean by "data"
To get started, let's decide what we mean by the data in Dodo Pizza Engineering:
- Events that send services (we have a common bus built using RabbitMQ);
- Records inside the database (for us, this is MySQL and CosmosDB);
- Clickstream from a mobile application and website.
In order for Dodo Pizza's business to use and rely on this data, it is important that the following conditions are met:
- They must be holistic. We must be sure that we do not change the data during processing, storage and display. If a business cannot trust our data, then it will not be of any use.
- They must be time stamped and not overwritten. This means that at any moment in time we want to be able to roll back and look at the data of that period of time. For example, find out how many pizzas were sold on July 8, 2018.
- They must be reliable. In the process of collecting and storing data, we must not only lose integrity, but also reliability. We cannot lose data, time slices, because together with them we lose the trust of our customers (both external and internal).
- They should be with a stable scheme – we write requests for this data. We would really not want them to change so much with a change in the application code, with refactoring, that our requests stop working. The one who writes the requests will never know that you did the refactoring until everything breaks. I would not want to know about this from customers.
Given all these requirements, we came to the conclusion that the data in Dodo is a product. Same as public service API. Accordingly, the same team that owns the service should own the data. Also, data schema changes must always be backward compatible.
Traditional Approach – Data Lake
To solve the problems of reliable storage and processing of big data, there is a traditional approach adopted by many companies that work with such a pool of information – Data Lake. As part of this approach, data engineers collect information from all components of the system and put them into one large storage (this can be, for example, Hadoop, Azure Kusto, Apache Cassandra or even a MySQL replica, if the data gets into it).
Further, these same engineers write queries for such a repository. Implementing this approach at Dodo Pizza Engineering implies that the Data Engineering team will own the data schema in the analytic repository.
With this scenario, the team becomes very sad cats and that's why:
- She must keep track of changes in ALL services within the company. And there are a lot of them and there are a lot of changes (on average, we merge ~ 100 pull requests per week, while many services do not do pull requests at all).
- When changing the data scheme, the product manager and the team changing the data scheme must wait until Data Engineering completes the code necessary for the changes to be supported. Moreover, we have long been featured and the situation where one team is waiting for another is very rare. And we do not want this to become a “normal” part of the development process.
- She should be immersed in ALL company business. A chain of pizzerias looks like a simple business, but it just seems. It is very difficult to gather enough competencies in one team to build an adequate data model for the entire company.
- It is a single point of failure. Each time you need to change the data that the service returns or write a query, all these tasks fall to the Data Engineering team. As a result, the team has an overloaded backlog.
It turns out that the team is at the intersection of a huge number of needs and is unlikely to be able to satisfy them. At the same time it will be in constant time pressure and stress. We really do not want this. Therefore, you have to think how to solve these problems and at the same time get the opportunity to analyze the data.
Flowing from Data Lake to Data Mesh
Fortunately, not only we asked ourselves this question. In fact, a similar problem has already been solved in the industry (hallelujah!). Only in another area: application deployment. Yes, I'm talking about the DevOps approach, where the team determines how to deploy the product that they create.
A similar approach to problem solving Data Lake was proposed by Zhamak Dehghani, ThoughtWorks consultant. Watching Netflix and Spotify solve such problems, she wrote an amazing article on How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh (the link to it was at the beginning of the article). The main ideas that we took out of it for ourselves:
- Divide the large Data Lake into data domains that are very similar to domain-driven design domains. Each domain is a small bounded context.
- The Feature Team, which is responsible for the DDD domains, is also responsible for the corresponding data domains. They store the scheme, make changes to it, load data into it. At the same time, they themselves know everything: how to change the data loading and not break anything when the application changes. Knowledge does not go anywhere. To open the data, they do not have to go anywhere. The team itself conducts a full development cycle from changing operational data to providing analytical data to third parties. One team owns everything associated with the domain (both the business domain and the data domain).
- Data Engineer – A role within the Feature Team. This does not have to be an individual, but it is imperative that the team possess this competency.
Meanwhile, the Data Engineering team …
If you imagine that all this is realized at the click of a finger, then it remains to answer two questions:
What will the Data Engineering team do now? Dodo Pizza Engineering already has a / SRE platform team. Its task is to give developers tools for easy deployment of services. The Data Engineering team will perform the same role for data only.
Turning operational data into analytic data is a complex process. Making analytics available to the entire company is even more difficult. It is the solution to these problems that the Data Engineering team will deal with.
We are going to provide Feature Team with a convenient set of tools and practices with which they can publish data from their service to the rest of the company. We will also be responsible for the general infrastructure parts of the data pipeline (queues, reliable storage, clusters for performing transformations on data).
How will Data Engineer skills appear inside Feature Team? Feature Team is getting harder. Of course, we could try to hire one Data Engineer in each of our teams. But it is so hard. Finding a person with a good background in data processing and convincing him to work inside a grocery team is hard.
The great advantage of Dodo is that we love internal learning. So now our plan is this: the Data Engineering team begins to publish the data of some services, cries, pricks, but continues to eat a cactus. As soon as we understand that we have a ready-made process for publication, we begin to talk about it in the Feature Team.
We have several ways to do this:
DevForum, in which we will tell you what the process we created looks like, what tools are there and how to use them most effectively.
- Speaking at DevForum will help us gather feedback from product developers. After that, we will be able to join product teams and help them solve problems with the publication of data, organize training for teams.
Now I talked a lot about publishing data. But there is also consumption. What about this issue?
We have a wonderful BI team that writes very complex reports for a management company. Inside Dodo IS, there are many reports for our partners that help them manage pizzerias. In our new model, we think of them as data consumers who have their own data domains. And it is consumers who will be responsible for their own domains. Sometimes a consumer domain can be described with a single request to the analytic repository – and this is good. But we understand that this will not always work. That is why we want the platform that we will create for product teams to be also used by data consumers (in the case of reports inside Dodo IS, these will be just teams).
This is how we see working with data in Dodo Pizza Engineering. We are pleased to read your thoughts on this in the comments.