How a systems analyst can develop data culture

Every company generates, processes and stores data. Small companies use excel spreadsheets, larger companies use huge data warehouses and teams that maintain them.

“Data is the new oil.” Clive Humby

I have not been spared data either: I collect quantitative motivation for a new task, choose a solution based on statistics, and check the success of a feature after the release. Over the 3 years of working at Kontur, I have written countless SQL queries, made several dashboards in Redash, described dozens of front-end and back-end metrics in productions, and racked my brain trying to understand how to extract data from Cassandras.

Working with data is great, but if it is of poor quality, then the result of the work is the same. And it takes a long time to get to the result. But it turns out that many problems can be solved with the help of data governance.

In this article I will tell you: what is data governance, what problems data governance can help solve and how to apply data governance in practice.

Let's start with the definition

Data governance (DataGov) is a data management system that ensures high quality, availability, integrity and security of data in an organization.

I would like to point out right away that data governance is a large-scale, comprehensive area that allows you to build processes and approaches to working with data at different levels. Therefore, I believe that every analyst can also influence the data culture and use DataGov approaches on the scale of their team and in the context of their tasks.

Before learning how to use DataGov, let's look at examples of what an analyst's pain points are when working with data.

What problems can there be with data?

1. There is no data or there is insufficient data

We rolled out the feature, but didn't write logs. As a result, we can't calculate metrics and answer the question of whether the feature took off, it's hard to dig up the mess.

2. There is data, but it is of poor quality.

We agreed to write a string in JSON format in the DB table field. By chance, the developer put an extra quotation mark and now invalid JSONs have started flying into the DB. It turns out that the data is there, but it is problematic to parse and work with it.

3. The data is there, it is high quality, but it is difficult to obtain

The team is in the process of migrating from MS SQL to PostgreSQL. The analyst (me) needs to compare data from different DBMS. I have no idea how to do it “beautifully”. All that comes to mind is to put the results of queries from different sources into one Excel table and do some magic. Large volumes of data cannot be processed this way.

4. Discrepancies in data/lack of a unified approach

In our company, as in many others, there is no consensus on many basic definitions – client, lead, balance, etc. This leads to the fact that in different reports, dashboards and other artifacts, the data may diverge, and I start any conversation with a person from another team by determining what meaning each of us puts into the same words.

5. Long data search

An analyst can spend 30-50% of their working time searching for data. It turns out that highly paid specialists spend only half of their working time solving their direct tasks. This is unprofitable for business.

Yes, the analyst most likely understands the storage and tables of his product. But personally, in my work, cross-cutting interproduct integration tasks increasingly appear. And here the analyst already has to climb into someone else's data garden. There is no understanding where to look for them, so you have to take the long way: ask a knowledgeable colleague, write to the product's motermost channel (describe in detail what you need to get and for what purposes), and then wait until the person on duty gets to your question.

These are not all the problems that an insufficient level of data culture brings. There are also data leaks, hardware shortages, and others.

About data governance

Having experienced all the pain that working with data can cause, I would like to supplement the above quote from Klav Hamby:

“Data is the new oil. It can be just as toxic if not managed.”

Therefore, it is important to learn how to work with data consciously. Different companies have set this goal for themselves. They have tried different approaches. Through evolution, the industry has come to the point where a single approach to data governance has been formed, which aggregates all the accumulated experience.

Data governance consists of 10 identified areas towards which a company/product can be developed. All areas are important and necessary, but this does not mean that everything needs to be pumped up at once. It will be more effective:

  1. Understand what each direction is and why each of them is needed

  2. Assess what level each direction is at now

  3. Prioritize areas in terms of labor costs and profit

  4. Act!

Of course, such work, even at the team level, is a labor-intensive process that requires competence and interest from all its members.
The good news is that some of the ideas and approaches from DataGov can be applied right now and will help to alleviate some of the problems when working with data.

Useful tips

  1. It is better to collect extra data than to suffer from the fact that it is not enough.

  2. It is a good practice to think at the analytics stage about what data and why you want to collect as part of the task.

  3. To remember to think, you can embed the “Metrics” block into the production template. That's exactly what we did in my team:

  1. It is useful to have rules for describing data/processes/metrics. It is important that they are clear and concise.

  2. It's helpful to follow the rules and describe the data/processes/workarounds on a wiki or other artifacts your team creates.

  3. It is useful to maintain consistency in names. This simplifies search. For example, our team has an article on how to name metrics, which encourages the analyst to moderate his creative thinking and dictates the requirements for the name:

  1. You can describe tables and fields in them directly in the DB; many DBMSs have this feature. This will help you, new team members, and people from other teams navigate your tables better. As a rule, you can add a description to a table using an SQL query.

  2. It is useful to record how a particular indicator was calculated so that another person can reproduce it. And after some time, it will help you remember where the numbers came from.

  3. You can create an automatic reference book that will contain all the metrics/data. This will speed up the search. Our team implemented it using a dashboard in Redash, which displays all the metrics we write. The dashboard has the ability to search by name, displays the date of the first and last entry for each metric, and it is possible to view an example of a record for any metric. What it looks like:

Let's sum it up

In order for data to work for us, we need to manage it. This can be done on a company-wide scale, or locally in your team and product. A systems analyst touches data in their work, which means they can influence the maturity level of the data culture.

I opened the door to the wonderful world of data governance and stuck my curious nose in. Now I look at the processes related to data management in my work differently, I understand their importance, I am aware of the problems and I know how to approach their solution.

And you can start with small steps. Following simple tips, we will make our work with data and the work of our colleagues more convenient.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *