Who is a Data Engineer

Yes, on your Internet there is a lot of material about who a Data Engineer (DE) is, including on the hub itself. But I wanted to talk about it myself. I have experience, although small, in this area (Currently Data Engineer at Sber Education).

THIS ARTICLE IS NOT SUPER TECHNICAL, IT IS STRICTLY ALL SCIENTIFIC. HERE I EXPLAIN THE TOPIC IN SIMPLE LANGUAGE (I can’t do it any other way)

Who is he?

In order to explain who a data engineer is, I drew the picture below

What is drawn there:

There are different sources (different databases, websites, files, etc.), the data engineer collects all this data and puts it in the database, then the masters of their craft build graphics and AI from this data.

It may seem that the data engineer has the easiest task in this chain, and perhaps this is so, but let's look more specifically at what exactly a data engineer does

ETL

ETL (Extract, Transform, Load) is the process of transferring data from different sources into one place, after which this data is transformed and stored in a form convenient for analysis or use. The arrow “Does Data engineer” in the figure above is this very ETL

That is, we (as a data engineer) need to take data from different sources (databases, files, web services/API), convert them and put everything in a convenient form, in the correct format, “clean” into our database from which these other specialists will already be able to use the data. And, of course, so that it all works later without our participation

Example: take api data, parse the js, remove information we don’t need (for example, clients who have some parameter empty), and then from the resulting table and several others using sql, create a showcase (more on them below), so that later Someday a guy in a suit was able to look at the graph that was made using this showcase and make some important decision

About showcases

The final product of a data engineer is usually a display case – a plate containing the necessary, processed, converted data from different sources. Storefronts for DE are like a website for a front-end developer.

Who should build them – write the sql script? There are 3 options

  • The data engineer himself finds out what is needed and writes the script himself

  • The date analyst sends the finished sql data to the engineer

  • The data analyst says what needs to be done (possibly in pseudocode format), and the data engineer does it

In different places, different options are possible, so you need to be prepared for anything

Other tasks

Setting up ETL processes is the main task for DE and most of the time he usually does this, but there may also be other tasks:

  • After all, we put the data in the database, which means everything should be fine there, order, correct schemas, absence of garbage, performance and all that, often it is the DE that monitors this

  • Describe what data is stored in the database so that other people can understand what is where and where to go. The data engineer is also involved in this process.

  • Not everything in this world is perfect, not even our ETL flows, so we also need to track this and run to fix them in case of an error.

  • Monitor the quality and purity of data. If DE was told to take data from somewhere and put it in the database, and then it turned out that the data was garbage, then DE is to blame for this too

  • And, of course, there are still a bunch of other tasks – convened, seagulls with cookies…

What a Data Engineer Should Know

From the text above, you may have already been able to understand what skills a data engineer needs, but I still put it in a separate block

NECESSARILY:

  • Databases and SQL. A data engineer works with data and therefore he must understand what a database is and be able to get the necessary information from there using sql. And not just be able to do select, but also do more complex things – where, join, window functions, procedures and more

  • Python or other programming language. Most often the main DE language is python, but in some places other languages ​​are also used – Java or Scala. But if you are just starting out, then knowing only python will be enough. The level, at least in the initial positions, does not need to be too steep. Know some basics, api, libraries for working with data.

OCCURRED RARELY OR IN ADVANCED:

  • Better sql proficiency to write more complex queries and better knowledge of a programming language to write slightly more complex jokes

  • Know not only some general information about the database, but also understand how one database differs from another and where which one is needed

  • Big data – Hadoop, Spark and all that

  • Message brokers such as Kafka

  • Bi-systems – DE is unlikely to build dashboards, but at the initial stage everything is possible

  • Networks, infrastructure…

  • Understanding of business – after all, a cool programmer doesn’t just close tasks, he solves business problems

It is also necessary to know some general information for most programmers – Git, console

Well, in general, just open some site with vacancies and look at the requirements

Advantages and disadvantages

When asked which direction to choose, I usually answer that try it, and then choose what you like best, but after all, I’m writing a useful article here, so I’ll write the pros and cons of the specialty

PROS:

  • Grandmas. Not millions, of course, but among all IT specialties, people who work with data are in the first half of the highest paid IT specialists

  • Easy to come, easy to go. The main skills of a data engineer are knowledge of python and sql, this knowledge is also needed in many other areas, so it is not very difficult to come here from another area, or vice versa to go to another place if you get tired

MINUSES

  • Might be boring. Other programmers make websites, applications, AI and other things that you can touch or say wow, and the result of your work is a plate in the database. For those who want to see the result of their work in a more obvious form, the DE profession may seem boring

  • The job title says one thing, but the reality is something else.. Perhaps this is true in all areas, but for a data engineer it is definitely relevant. “You work with data, which means you will be responsible for everything that is somehow connected with the data,” and here we can talk about a lot of things, including the responsibilities of a data scientist and a data analyst, and about creating a report in Excel or build it yourself for itself infrastructure

  • Remote work is not always available. Working remotely is something commonplace for an IT person, but those who work with data do not always have such an opportunity. SAFETY

Briefly

What kind of education is needed? – I don’t have a specialized education, but it’s probably better to have something related to programming

Career growth? – You can become a CDO. This is the chief data officer in the company.

Which companies need data engineers? – Needed everywhere

useful links

  • Article on Habré with useful links

  • Library information systems engineering links

  • There are a bunch of courses (mostly free) on python and sql on Stepik

If you suddenly liked the article

If you liked the article, you can subscribe to my telegram channel https://t.me/datamisha there I write about my work as a date engineer

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *