use a pivot table to work with BigQuery, Snowflake and ClickHouse

Hi all! My name is Sergey Konkov – I work as an architect at CloudReports. Today I will tell you how we created a product that helps users work with data and, to some extent, connects the two worlds of analytics: Excel and cloud data storage.

A task

BigQuery and other analytical warehouses, combined with modern BI tools, have revolutionized the work with data in recent years. The ability to process terabytes of information in seconds, interactive dashboards in DataStudio and PowerBI made the work very comfortable.

However, if you look deeper, you can see that the main beneficiaries of these changes are professionals who own SQL and Python and business users in leadership positions for which dashboards are developed.
But what about hundreds of millions of employees for whom Microsoft Excel has been and remains the main analysis tool? In a sense, they were left out of the new changes. These are sales managers, small business owners, heads of small departments. They don’t have time to master PowerBI. All that remains for them is to export data from reports to their favorite Excel and continue working there, but this is not very convenient, it takes time and there are limitations on the amount of data.

We often see how our clients using Google BigQuery load data into Excel using various connectors, bumping into limitations. And the idea was born: if Excel does not lose popularity, and the data goes to the clouds, then let’s come up with a way to help the user work from Excel with the cloud.

Remembering OLAP

Yes, Excel is still the most popular information tool in the world today. A PivotTable is what millions of users use every day. And there were even more before. If you worked with data in a large company ten years ago, you probably heard about OLAP cube technology from Microsoft and other vendors that are created on top of relational SQL databases and allow you to get the results of processing millions of rows of data in seconds. The most popular way to work with OLAP cubes was and is the Excel pivot table. By the way, OLAP is still very common in the corporate world, it is still part of Microsoft SQL Server, but it has a number of limitations in terms of volume and processing speed and is increasingly yielding to the market for cloud analytical storage.

So OLAP will help us in solving this problem. As I wrote above, Excel has a ready-made client for working with OLAP, we will use it.

When Microsoft introduced this technology to the market, an open protocol for working with OLAP databases – XMLA (XML for analytics) was published. It is this protocol that Excel uses when it connects to the OLAP server. Everything works like this:

Solution

The idea is simple – instead of an OLAP server, we will make a Python application that will do the following:

  • accept XMLA queries from Excel

  • convert XMLA query logic to SQL code

  • send SQL query to BigQiery

  • Convert response received from BigQuery to XMLA and send back to Excel

This application (App) can be published to the cloud, since Excel has the ability to send XMLA requests over HTTPS. Everything will work like this:

Usage

Once we have developed and published the application, the BigQuery administrator simply needs to create a table and define aggregation types for the corresponding fields (sum, min, max, etc.) to get started. Next, the user in Excel using the connection to analytics services (OLAP) connects to our service:

After that, we access the BigQuery table directly from the pivot table. And we can easily “play” with the data.

In addition, we implemented a data caching layer in this service to speed up queries and save BigQuery costs.

What’s next

Now we are actively testing the service on our clients and are thinking about adding new functionality.

For example, SQL queries, along with BigQuery, also support other cloud data stores. By adding one class to our application, we have implemented a similar mechanism for clickhouse. Version for snowflake and Amazon Redshift.

We will be glad to hear questions and opinions of colleagues in the comments.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *