How to compile a report in Yandex DataLens quickly and almost simply
Google Data Studio is, of course, good, but we have our own wonderful analogue – a similar tool from Yandex Datalens, one of the leading IT companies in the Russian Federation. Today, in a couple of paragraphs, we will try to quickly assemble a report similar to the material about Data Studio. Spoiler: it’s not as easy as it seems at first glance. But let’s figure it out.
First and most importantly, I really respect Yandex very much and follow with interest everything that happens in their open channels. But when it comes to building interfaces and reference documentation, I sometimes feel overwhelmed. Everything is ok with the design, no problems, and even the first screens of working with something are simple and clear. But as soon as you start to delve into the functions of the tool, more and more questions arise. Even the help search does not work quite the way I would like – I found answers to some questions while reading the docks, but did not find the search. Although, maybe I would have found it, but who knows, the contents of the second screen / second page of the search.
The choice here is somewhat more modest than in Data Studio, but at the same time more than sufficient for most projects.
In our case, the database is still the same MySQL, the settings are still the same, only we will work only with normalized data – there is no need to build any hypotheses about their purity.
After setting up the connection, I had a cognitive dissonance: getting into the main menu was not so easy. Despite the fact that the interface does not seem so overloaded with buttons, I did not expect to encounter such a problem. Looking ahead, I also had to import the muscle representation to CSV and add it as a source as well. Why – further in the text.
But the difficulties in my picture of the world did not end there. Let’s say I want to collect some new sheet with information. In Data Studio, the concept of “report” was the most understandable for me. It seems like there is also a dashboard here, but then what are the charts and datasets responsible for?
The “Datasets” section is a transformation layer (albeit a very simplified one) in our ETL engine. I did not find exactly this mechanism in Data Studio. That is, you can perform some manipulations with the fields, but already inside the report, which does not quite fit into ETL. Yandex took a much more logical (for me personally) path, while Google forces you to prepare high-quality showcases for BI solutions. Okay, this is also an approach, but Datalens definitely gets a plus in karma for the flexibility of the solution.
What types of data
This layer allows us to assemble a conditionally new storefront from raw data, which can be an aggregate of initial data. What opportunities do we have? First, let’s deal with the types of data that Datalens accepts:
“Geopolygon”. The internal story about the cards, I can’t tell you the details yet.
“Geopoint”. The same as above, only a specific coordinate in space.
“Date” and “Date and Time”. The differences are obvious, but the parser could not automatically understand that 06/17/2015 is a date. Strange and not obvious, but acceptable. You can change it manually, but I didn’t manage to explain to Lens centrally that the date is in such and such a format.
“Fractional number”. This is where you need to be careful, because the id, the year, and even the TIN can get into the fractional.
“Integer”. Here, I think, everything is clear and yet it would be worth making it the default numeric parameter. I see numbers – I think that the whole.
“Boolean”, i.e. boolean true/false.
“Line”. The second most popular data type after the fraction according to Datalens. Well, something like this is how it is, and it would be better if the TIN (which, by the way, is quite standardized for itself) was also defined immediately as a string.
What can be done
In the case of a string, you can count the total value for a field or the number of unique values, in short, count and count distinct. In the case of numbers, everything is the same plus the maximum, minimum, average and sum. The date is similar, but of course there is no sum. Well, for boolean data, we can only calculate the amount, which is also quite logical.
But if you want to create a new field, you will have a large number of functions available to choose from. Their names are specific, at the same time similar to something, and not like anything. Perhaps somewhere in Clickhouse there is a similar syntax, or in one of the varieties of SQL. There are not so many functions, the reference book is small and somehow painfully reminds me of the QlikView / Sense reference book.
Without using these functions, we can’t build anything, since the formation of a new field is the only way to explain to the parser what format the date is in (if he didn’t understand it right off the bat). Moreover, the DATETIME_PARSE function used for this will calmly determine the date format without your participation, the main thing is to use it in principle. What prevents it from being applied by default for all fields of the “date” format remains a mystery to me.
Although I was too quick with the fact that this feature will solve all our problems. The fact is that not all functions can work with all sources in the dataset setup. How it works? If the data is in ClickHouse, everything is fine with you, you don’t have to worry about anything. If the data is in some other DBMS, you need to look at the documentation and check if this or that function works with your source. At first I did not understand what was the matter, and then I realized: DataLens applies all transformations literally on the fly. True, how the same field value differs in different DBMS, and most importantly, I still don’t understand why everything works fine with CSV. Moreover, for fields where the date is specified in a more or less traditional form, YYYY-MM-DD, the field type was automatically pulled up. But collecting an indication of just a year in a date type will not work in any way.
It would seem, why do we need charts at all, if all visualizations can be done directly in dashboards? This is most likely due to the ability to place one chart in several dashes at once, although in practice I don’t often encounter such a need: usually I still have to customize something.
But not everything is so simple and unambiguous, because in addition to the classical understanding of the charts, there is a so-called. QL charts. These are objects that are built directly from the source (in this case, we are talking about the database). I plan to work with them next time.
Chart “Bar chart”
In order to work correctly with charts, you need to create calculated fields – indicators. This is done on the left side of the screen by clicking on the “+” icon:
Indicators also work with formulas, roughly the same as fields in building a dataset. I was able to use calculations in the formation of the chart without creating separate indicators, but these are crutches that work only through drag-n-drop.
An interesting point concerns the display of TOP-n positions. To do this, you need to create an indicator with the formula “rank(Field)”, which will rank the field, then put it in filters and select less than or equal to (or just less than) the required number. Not a very transparent scheme, is it?
In the case of a table, everything is quite simple: as columns, we indicate all the fields of interest to us (in our case, there are 4 of them), filters and other joys to your taste and color.
At the same time, the table settings are rather meager:
Here, too, everything is quite simple, since it has some minimum of settings. It’s not entirely clear to me what prevented the guys from screwing up the choice of font size, and not the dimensional grid of T-shirts (XL, L, M – you understand).
You can not create a separately calculated field, but calculate everything within the chart – most likely it will be more convenient for your purposes.
Building a Dashboard
To be honest, as soon as I entered this section, I was a little dumbfounded. It is somewhat strange and unusual to see a completely empty white sheet, even without control buttons. On the other hand, the appearance of a white sheet may, on the contrary, seem attractive to someone.
Okay, it’s clear that there is an “add” button there, but the UI still looks unfinished compared to masters or the same Data Studio. Okay, let’s try to add an element, maybe something will become clearer … No. There are already 4 options to choose from: chart, selector, text and title. Not that I could immediately come up with something that I lack … Oh, no, I could – images for example.
We are trying to add a title and find out a few features: you can change the block size only by pulling the lower right corner, you can adjust the text size only by 4 gradations large -> xsmall, you can’t control text indents at all. There are some issues with this usability. What about pixel perfect?
The settings for adding chart charts are quite simple and self-explanatory:
Why exactly “auto-height” is not completely clear, why then there is no auto-scaling at all, but let’s say.
Adding filters that are not filters at all, but “selectors” is also simple and logical:
But with the location of objects some kind of opportunity. Objects stick only to certain areas, and, for example, I did not succeed in placing filters to the left of the “total” indicator for records. Perhaps a little later it is worth picking up the grid in order to at least for yourself understand the reasons for this behavior of objects.
Putting it all together, we get the following picture:
From this longread, you may get the impression that I was not satisfied with DataLens or that I did not understand it too much. This is not true. The guys from Yandex even made a good chat in TG dedicated to this product, and the experts in it quickly resolve this or that issue. Except for those, of course, that at this stage it is simply impossible to solve.
The product is well suited for basic analytics. You can create quite complex formulas, interesting visualizations in it – but it is not always possible to do this logically, that is, the way it has been done for years in similar solutions. At the same time, some truncated source resources are simply bewildering: why should I migrate to Yandex.Cloud just to make dates become dates?
I haven’t fully figured out the possibilities of customizing the visual either: they are either not really there, or there are so few that it’s pointless to talk about it. Although there are so many of them in Google Data Studio?!
I hope that the project will develop by leaps and bounds, more and more people will learn about it, and Yandex itself will listen to users and implement everything that is so lacking now. I managed to do what I had in mind – I got a result similar to the service from Google. And in some places it was even easier – but only in some places.
Thank you if you have read up to this point. The next milestone is some other import-substitution BI or something self-written in Python. After all, we are just making tools for analysts.