Date office in a large company: what needs to be provided

Let's continue the topic of Data Office in a new article. Many companies are now building their own data departments. Centralization of data processes saves time and effort, eliminates data out of sync, and over time increases the profitability of the entire business. But this is only if all processes are built correctly. Where to start and where to go in creating a data office? What is important to remember for specialists who will deal with this? Let's tell you from our own experience.

Avatar of expert Marina Kormshchikova

Marina Kormshchikova

Product Manager “Neurogateway” PJSC “Rostelecom”

Avatar of expert Boris Emelyanov

Boris Emelyanov

Technical Director of the Data Platform of PJSC Rostelecom

The Data Office at Rostelecom began to emerge in 2017-2018 after the emergence of a single technological repository. Having received the first successful projects on a centralized platform, we thought about the same centralization of work with data. Before this, data was dealt with separately in different departments.

When you have one data storage, you spend N money, when you have two, the costs increase tenfold, because in addition to the software, maintenance personnel appear, and internal processes become more expensive. And then people stop understanding where to go for data. Everyone gets different results, and conflicts begin.

How we built the Data Office

Step No. 1. We took an inventory of all projects and created a single outline

Any modern company must have data management processes, even if there is no uniform policy. Therefore, the first step to creating a data office is to analyze all projects that involve working with data, information, and analytics. Understand which of them are targeted, what to combine, and so on, and then create the architecture of all these interconnected solutions.

Each individual business already had its own data warehouse, and we needed to understand all the internal processes tied to these warehouses (sales, technical support, customer service) before switching everything to a single system.

It is important to think about the technical architecture of the centralized loop on which reporting will develop. It needs to be thought through not only for immediate tasks, but also for the future: to lay down opportunities for scaling, to understand what analytical cases can be deployed on it. Even if now you have a real need only for analytical and regular reporting, batch delivery and processing, tomorrow you will need to connect commands to prepare data for ML models, for realtime reporting, and so on, using the same data.

Step No. 2. Explained the advantages of the transition to departments and teams

Replacing familiar work processes with something new naturally causes discomfort and dissatisfaction. Moreover, the advantages of centralization are not immediately visible: first we need to go through the complex processes of creating and debugging a single model, and only after a while we observe both a qualitative and quantitative effect.

In addition, additional workload was placed on employees. In the B2C segment, when switching to a centralized storage, colleagues had to not only be responsible for analyzing the movement of the client base, but also set up all data flows and verify the methodology for calculating indicators.

We found the most loyal business partners and implemented projects with them in a new approach, thereby strengthening its authority due to more effective results. We have abandoned manual operations: now in business segments, data generated using a unified methodology is loaded directly into management reporting. The business began to deepen analytics, set targeted tasks, and generally control the process better. Total costs throughout the company also decreased, including due to the abandonment of vendor licenses and the transition to our data management platform.

Step No. 3. Decided on the technology stack and trained specialists

We understood that the number of systems in the company's landscape would only increase over time, and we must be able to quickly retrieve data through any integration methods. Therefore, when choosing an architecture, versatility and rapid scalability were important factors.

It would not have been possible to solve the problem of building a platform based on the classic DWH model with the participation of large vendor appliances. We began to look towards software-based and open-source based solutions. Hadoop and Greenplum were chosen as the core of the system – flexible and highly scalable tools, with mutual compatibility and a large ecosystem of related technologies.

We have not forgotten about the TCO of such products: often the low cost of licenses hides the high cost of support. To effectively use such solutions, we have engaged an experienced expert team.

Step No. 4. Provided for solving current problems using data

The main problem is that customers will not wait until we make a beautiful landscape, build storage facilities and figure out how to work with new technologies. They need data here and now. Therefore, it is important to set up processes in such a way as to provide the business with ready-made analytical solutions and, at the same time, develop an ideal data office internally.

So, we had two directions: an analytical reporting factory, where a separate front manager was responsible for each business segment, and a project direction, which covered technical debt and developed the technical component. It built a centralized storage, streams, directories, worked on quality and data protection: on the one hand, they helped the “fronts” so that they would do reporting here and now, and on the other, they thought through the rules that would help move from a startup state to an industrial one.

A well-coordinated Data Office should be universal and easily scalable, hardware requirements should be simple, mainly based on well-known open source solutions. This approach provides flexibility and stability of support with a relatively low TCO.

Step No. 5. Formation of a data-driven culture in the company

We devote a lot of time and effort to training our employees to work with data and develop a data-driven mindset. We conduct “soft” programs where we analyze how Amazon and Google increase the profitability of their business using data analysis. We launch “hard” programs where we teach how to work with BI dashboards, databases, Hadoop and other technical tools. So, after the program on working with neural networks, 600 new users registered on our portal.

There is also an external story: in 2019 we created a free public project DataTalks. Initially, we planned to train students to work with data, but in the end, current data specialists and even CDOs of several companies came to the program.

When establishing a data office, it is important to develop and establish the rules of the game: determine what working with data is, what roles and processes it consists of, what are the responsibilities of all participants in the process of collecting, analyzing and interpreting data provided for in the overall strategy. In essence, this is a legislative body that regulates the work with data for all numerous departments and projects, current and future, and minimizes the risk of “out of sync”. At Rostelecom we called it the data management policy.

How to determine that everything worked out

Using these metrics, we determine that the division is established and the Data Office has become an effective business function.

The number of requests is growing. Our internal data portal has an average of six thousand users per month.

date office

Businesses are not trying to create centralized solutions “under the table.” Even if a business cannot find ready-made templates for reports and analytics, it comes and solves its problems using our centralized infrastructure.

There is no staff turnover. If people don’t understand the goals and objectives, and don’t understand how they benefit the business, they quickly quit. In our case, the core team has not changed since 2019: that is, the people who started the Data Office have remained and continue to grow within the division.

The product is being updated. We are improving the availability of the system, the speed of preparing regular data and a number of other technical characteristics: we started with monthly reports, but now almost all data products have been transferred to daily updates.

The download management pipeline has become more stable. We were guided by the following principles to speed up work with products:

  • automated processes by creating data loading templates and specifying the template selection logic based on metadata;
  • divided processes into atomic stages, describing in the repository the steps for loading, checking, synchronizing, collecting statistics, and so on;
  • defined the order of execution of tasks according to the FIFO principle, so that they are registered and not executed immediately;
  • configured discrete-continuous downloads.

We have established a systematic work with sources: we trained a support team for this, increased the capacity of the infrastructure, and aligned the schedules of regulatory processes.

We have also built a clear schedule and resource management for the storage: we make maximum use of night hours to fill data, and daytime hours to consume it. We take locking into account and are guided by the built-in logic of table dependencies.

Now we process 300 thousand processes and 1.5 million operations per month. About 85% of downloads from sources in ODS are implemented using ready-made processes. Developers spend a few minutes developing new processes for downloading to DDS and storefronts, devoting most of their time to implementing and debugging business logic.

There is a financial effect. We measure every data product and service in terms of return on investment. The simplest example is from Data Science: DS models allow us to sell more services to clients or improve the service.

What is important to remember when creating a Data Office?

  • When forming new rules, you should not sharply break existing ones: only help and supplement.
  • Focus not only on technical solutions to problems, but also on business implications. By creating a solution simply according to the specifications, we will get something that will work, but will not give the business effect that it could have, or will be completely useless. We need to collaborate more with the business to understand the future customer use case.
  • Meaningful work with tasks will help the employees of the new division not to burn out: this way they will understand that “they are not just laying bricks, but building a temple.”
  • It is important to explain to all participants in the process that working with data is not only the responsibility of one small department, but also the work of the entire company. Customers must understand what data is needed for, what insights can be obtained from it – and work to ensure that the company has this quality data.

It seems to me that the ideal data office from a business point of view is a department whose existence you may not even know, but at the same time you can solve any of your analytical problems and get an answer to any request.

A good Data Office is a place where professionals work who understand what they are doing and do it with an eye to business results. They also proactively offer business solutions: they not only answer requests for revenue and so on, but they themselves suggest what data is needed and how to best use it.

Do you still have questions about building a Data Office in your company? We will be happy to answer them in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *