Who is a Data Engineer | Data Engineer

A data engineer is a person who works with data. A fairly simple definition, which hides several layers.

Let's figure out together what kind of animal is a data engineer?

The role of a data engineer is determined by the maturity of the company. The main factor for the emergence of a data engineer is the desire to work with data and extract value from it.
Many companies over time understand the value of data, it is not for nothing that there is a saying: “Data is the new oil“.

The main concept that companies use when hiring a data engineer and defining his role is Data Governance.


I like to explain everything with examples, so let's imagine the company “OOO Roga i Kopyta”. The company is developing, there is a website, there are customers, there is a written backend, etc. This company is successful in selling horns and hooves.

But the company has a new goal – to develop further, to improve its service for selling horns and hooves in order to get ahead of competitors, and therefore they need a data engineer.

I will continue the story about how it could look for this company. Instead of “OOO Roga i Kopyta” there could be any company, perhaps my story will be applicable to your company as well.


To begin with, I would divide data engineers into “branches“.

The first branch is the core. These data engineers use specific tools that allow them to assemble the “core”. By “core” I mean the basis, architecture, and principles by which data is collected in the company.

The data that the “core” collects is used as a “standard”, it is usually a “source of truth”. We will not talk about what tools the people from the “core” use. But I will say that it can be anything: open source, paid software or something self-written.

The “core” can also create frameworks, interfaces, or other software for colleagues that would make it easier to work with data.

The “core” collects “raw” data and provides it “as is” without changes. Sometimes it can perform normalization and denormalization operations.


The second branch is product data engineers. They most often already use data collected by colleagues from the “core”. product DI Not connect sources and do not collect initial data. They help the business grow. It is not for nothing that I called them “product data engineers”. This does not necessarily mean that these data engineers deal only with the product. They can deal with finances, expenses, future projects and other areas that the company is currently targeting. Here I would rather emphasize that a product data engineer does not fine-tune the “core”. He helps the business grow.


It's also worth noting that product data engineers can be different from each other. For example, our “core” team collected data on horn and hoof sales, as well as a table of users.

Now businesses have two tasks:

  • Calculate the ratio of men and women every month in order to make the right promotions that will help the business grow.

  • Calculate the number of horns and hooves that men and women buy by day. The tasks are fairly simple, but the solutions to these tasks may vary. It all depends on the “core” team.

Now in more detail:

  1. The “core” team could have written its own framework for working with data, and therefore, in order to complete the assigned tasks, the product data engineer needs to take this framework, use the necessary methods from there, and get the desired result.

  2. It may also be that there is no framework, but there is some publicly available tool for building threads that has been approved by the “core” team and it will help us complete these tasks. For example, in this tool you can create a thread without writing a single line of code. Just by rearranging the cubes and connecting them with arrows.

  3. If there is neither one nor the other, then the team of product data engineers decides for themselves how they will perform these tasks. The team can use anything: open source, paid software, or something home-made.


The third branch is optional and depends on the maturity of the company. These are ML data engineers. It is specific because there is no “standard” and it is impossible to clearly say who an ML data engineer is.

But I would single out this entity as a data engineer who understands Data Science (DS), who understands what data scientists need and understands how to best optimize tables and data storage for ML tasks.

ML data engineers can use data from both “core” and product data engineers.

All these branches can exist in one company.

But you shouldn't refuse to work in companies if they don't have ML. Maybe they don't need it or they just haven't grown to that level.


The third and second branches come from the first.

Most often, it is arranged in such a way that data engineers from the second or third branch do not work on the “core”, they often may not know the technologies that the “core” uses.

But at the same time, the “core” often has skills that allow it to replace the second or third branch, since the data engineers from the “core” have stronger and more advanced skills.

And it also happens that the second and third branches may not intersect in their skills.

For this reason, all three branches can exist in a company at the same time.


It is also worth noting that the existence of these branches depends on the maturity of the company. The format of the branches also depends on the maturity of the company.

Because all data engineering is a set of principles that are based on the concept of data governance.


That's all I wanted to tell you about data engineers. If I missed something or you would like to go deeper into some area, please write about it in the comments.

Also, if you need a consultation/mentoring/mock interview and other questions on data engineering, you can contact me. All contacts are listed by link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *