Why does every Data Scientist need a Data Engineer?

In this post I want to share my translation of one curious articles on Medium on the topic “who’s who in IT, and how a business can get the most out of each specialist.”

The translation was prepared with the support of the analytical course community DataLearn and telegram channel Data Engineering

In the photo: Eunice Lituanas.
In the photo: Eunice Lituanas.

Estimated reading time: 3 minutes 32 seconds.

Data Scientist has been called “the sexiest profession of the 21st century.”The Harvard Business Reviewexplains this by the fact that such a “hybrid of a hacker, analyst, negotiator and valuable advisor” – a very rare combination of skills, and highly paid.

Is it too good to be true? According toForbes, Yes. It turns out that most of the time (up to 79%) data scientists are doing jobs they hate.

Data Scientist Demand

Thousands of companies hiring data scientists from various fields as secret weapons for their business, imitating To the quanta of Wall Street 80-90s, who had unique abilities to understand and interpret data. Just like in the video The big short

Considering that there are about 11 thousand data scientists on the market, and the demand for them is growing sharply, the competition among employers for these specialists is very fierce.

United States Bureau of Labor Statistics believes that in 2018 demand will exceed supply by 50-60%. And according to forecastsMcKinsey, in 2018, the US alone will lack 1.5 million analysts and managers who know how to work with data and make decisions based on it.

Companies that don’t hire data scientists now simply won’t be able to find them.

Translator’s note:

article of 2017, therefore, specific forecast figures are not particularly relevant, but, according to more recent estimates, the overall trend does not change – in the next few years, the demand for DS will continue to grow, but there will be a shortage of qualified personnel.

The Role of a Data Scientist

So the company is hiring a data scientist, and then what? How do they improve the environment to maximize the specialist’s skills and convince them to stay?

First, let’s look at what it consists of typical data scientist working day:

  • building datasets for training models (3% of the time)

  • data cleaning and preparation (60%)

  • assembling datasets (19%)

And here we understand how not sexy this work is, because the overwhelming majority of specialists unanimously declare that the most disliked part of their work is assembling datasets, preparing and cleaning them. Moreover, preparing and cleaning data has nothing to do with finding insights, it’s just transforming the data into the desired form. Yes, this requires serious skills, but not in the field of data science.

Companies could free up up to 79% of the data scientists’ time (which they could spend on analytics) by shifting the responsibility for preparing the data to someone else. In this case, companies, on the one hand, will benefit from the fact that their specialists devote more time to searching for insights, and they, in turn, will have the opportunity to do what they really love.

In turn, the preparation of the data should be transferred to a special specialist – the data engineer.

The Role of the Data Engineer

The demand for data engineers is also growing. The article The Rise of the Data Engineer Maxim Boshemin, a data engineer at Airbnb, talks about how he joined Facebook as a BI developer in 2011 and left the company two years later as a data engineer. The need for more sophisticated code-based ETL and changing data models are fueling demand for data engineers, he said.

So what is the job of a data engineer? This is data extraction, processing, filling, cleaning and / or automation of data analysis. Bochemin describes it this way: “A data engineer creates tools, infrastructure, frameworks, and services. In small companies – where there is no infrastructure team yet – the data engineer’s job can also include building and maintaining the company’s data infrastructure. ”

In other words, the data engineer doesn’t find insights by itself, but prepares reliable data. For whom? For data scientists and data analysts.

Return of Sexuality (Original – Bringing Sexy Back)

We couldn’t resist.

If data engineers were busy cleaning, preparing and assembling datasets, data scientists could focus on finding dependencies, improving algorithms and other sexy parts of their work. Well, you understand.

In simple terms, the collaboration between a data engineer and a data scientist can be described as follows:

To build a system in which already pre-prepared data gets to analysts, companies need to take two steps:

  1. Introduce a new position – Data Engineer – and create a culture of data engineering and openness data.

  1. Introduce new data processing technologies (Airflow, Kafka, Spark, Mesos, etc.) that allow you to quickly work with large amounts of information.

Those companies that succeed will definitely become more attractive to the best data scientists. And as a result, they will get more value from the available data.

Similar Posts

Leave a Reply