graphics and shareable URLs

Today I wanted to tell you about new features hippotable – my open-source project for analyzing data in the browser. This is the construction of graphs and sharing of dashboards using a link. I have already announced the project on Habré, in a nutshell:

  1. Import of fairly large CSV files – tested up to 100 MB.

  2. Cool tabular presentation – the entire dataset can be scrolled even on your phone.

  3. Basic no-code operations for data analysis: sorting, filters, aggregation.

  4. Everything works right in the browser, so sensitive data never leaves your computer.

For the last six months, we have been testing hippotable for ad-hoc analytics at work: we looked at logs and downloads of data on conversions and user activity. In general, everyone liked the tool, but some problems emerged – not that unexpected, but I initially planned not to fly into space, but to launch fast and a little bit at a time, and then see what is missing more. Here's what was missing:

  1. Some trends are difficult to understand with your eyes from a table without graphs. You can transfer the processed data into Excel and build graphs there, but solving the issue on the spot is much cooler.

  2. The analytical pipeline cannot be saved—when the page is reloaded, it has to be retyped, and the result can only be shared through export or a screenshot.

Today's release is in DevLog format. I’ll tell you in detail how we solved these problems:

  1. How did you choose a library for graphs, and which ones? small nuances we missed.

  2. How to put state in a URL without screwing it up.

For examples today we are working with an interesting dataset World Development Indicators – numerical data (population, economy, education and other fun) for all countries of the world since 1960. It will be interesting (if you are as boring as me), let's go!

Selecting a visualization library

Goal number one is to learn how to build graphs using tables. We are not ready to spend a lot of time on this, so we need a library that will build basic charts out of the box – scatter plot, line / bar / pie chart. The library should quickly render fairly large datasets (~10K points). Us not very important good API, because we still wrap the configurator in the UI. A large variety of settings and the ability to assemble any visualization are also not necessary, because it will still not be possible to transfer all this power into a no-code UI.

Now let’s move on to choosing the library itself: open relevant topics on GitHub (charts, visualization) and look at all the libraries with >1000 stars (or as hard as you want; remember, we're looking for something popular). We need to somehow reduce the choice, so let’s immediately divide the libraries according to rendering technology:

  • SVG/HTML. It's very easy to render graphics and add interactivity, it's just a DOM. But there are limitations on the amount of data – at 10K DOM nodes the page begins to warp. Since we are looking at a high-level library, we don’t care how simple it is inside, but we don’t need strict restrictions on the size of the dataset. They get here Apexcharts, Plotly.js, X6 and most d3 wrappers (billboard, visx). Refusal.

  • canvas. Instead of making DOM elements from the data and forcing the browser to render them, you can color the pixels yourself on the canvas – it will be faster. But under the hood, everything is more complicated: for example, for tooltips in SVG it is enough to attach a listener to the element, but on the canvas you have to fiddle with the cursor coordinates yourself and look for collisions. It's great that they've already done this for us! This category includes ECharts, G2, Chart.js

  • WebGL. Technically, we still use canvas, but some of the data transformations and rasterization have moved from JS in the main thread to the GPU – there will be even more data, but under the hood everything is even more complicated. WebGL drags in geographic information visualization: MapboxGL, L7. But there were no simple libraries to create graphs. Well, another time.

A small digression: it’s funny that the most famous visualization library, d3visualization library In fact is not. This is the core that binds data to elements (a level abstraction between jQuery and React / Vue) and all sorts of useful helpers for data type d3-scale. But what d3 doesn’t have are charts out of the box, so N wrappers have grown on top with different combinations of basic charts without a circus and convenient APIs for UI frameworks (observable plot recharts, visx, nivo, victory). In general, you can push canvas into d3 (officially stated: Bring data to life with SVG, Canvas and HTML.), but it turns out one big crutch.

As a result, we select from the canvas libraries: ECharts, G2, Chart.js. API G2 is more tied to use from code (the other 2 work with config objects, which are easier to assemble from the configurator). At first glance, Chart.js and ECharts hold parity, so I randomly (actually based on less monstrous documentation) chose Charts.js – we’ll do the first approach on it, get more information about the limitations, and maybe with new input we’ll try to push others libraries.

We build graphs

Now I’ll show you what graphs you can build.

First, line graphs. Let's see how the world's population has changed over the years: group by year, sum up the population of all countries, build graphs: demo. Fun facts: the world's population has been growing approximately linearly over the past 60 years; Rural population growth slowed sharply around 2000. Charts definitely help.

Of course, there are other types of charts (haha, any that I can whip up in half an hour by going through the charts.js options) – for example, scatterplot for unordered data. Let's look at import and export countries in 2014. Fun facts:

  • 3 countries (Hong Kong, although I’m not sure what country it is, Luxembourg and Singapore) basically do nothing other than transship imports into exports

  • Kiribati does nothing at all except import everything.

  • Overall it seems correlation, but we will calculate this specifically in one of the following episodes.

Not without lousy in the modern school of visualization of these types of graphs – here you go pie chart with population by country. Here it is used for its intended purpose, to draw attention to share several categories in the sum of indicators. I, of course, knew that many people live in India and China, but I didn’t think that exactly a third of all people in the world, and that all other countries are such dwarfs in comparison with them.

I'll be honest, the graphs turned out pretty lame. It’s definitely better with them than without them, but the current option is more suitable for testing the architecture and getting more information about weak points. Three big problems were identified:

  1. Graphs need titles, labels for axes and points, and rules for defining text in the tooltip.

  2. For most charts, the data needs to be preprocessed a little more – for example, for a pie chart, you need to first summarize the data by category, otherwise several segments will be generated for one category value, and this is stupid.

  3. It would be nice to combine several indicators on one graph (rural and urban population; 1 line per 1 categorical column value to build a family of graphs with the population of each country in one action).

The main limitation here is my UX imagination and time.

Sharable URLs

The second big problem that I wanted to solve was the ability to save analysis (sorting, filters, aggregation): both in order not to lose the state locally when restarting the browser, and in order to allow colleagues to also play with the pre-configured pipeline. Architecturally, we immediately laid down data separation and operations and serializable pipeline descriptions, so this will not be difficult to do. It will be possible to add other features from above, and from the hot side – replacing the dataset while preserving the pipeline: for example, we can prepare a flow that builds a pie chart with a breakdown of errors from logs according to the type of device and push different logs there.

We immediately dismiss the industrial option of saving the pipeline in a database on the backend: the main design goal of the project is simple deployment on its own infrastructure, and the presence of a backend certainly does not help here. In addition, I wouldn’t really like to deploy the backend for the hosted version myself and close it with authorization or risk getting 1K RPS to create pipelines from some naughty people.

We also discourage options with local storage (localStorage / IndexedDB), because they solve only half of the problem (sending localStorage will not work). All that remains is to save the state in the URL.

Analytical pipeline config – sprawling structure of the form

[
  {
    "mode": "aggregate",
    "key": [
      "Year"
    ],
    "columns": [
      {
        "name": "pop_urban",
        "sourceCol": "population_urban",
        "fn": "sum"
      }
    ]
  },
  {
    "mode": "filter",
    "filters": [
      {
        "name": "pop_urban",
        "value": 1000000,
        "condition": "gt"
      }
    ]
  }
]

So we will store it in the form of JSON in searchParams. But let's not forget that the maximum length of a URL is limited – in our case, GitHub pages returns an error for a URL of more than 8221 characters 414 URI too long. A large pipeline will fit into 8 KB, but I would like to make good use of this space and fine-tune the serialization format. Most JSON syntax is not URL-safe, so naive JSON causes a lot of percent-encoding and a lot of bloat:

JSON.stringify(state)
// '[{"mode":"filter","filters":[{"name":"Year","value":2014,"condition":"eq"}]},{"mode":"order","col":"population","dir":"desc"}]'
JSON.stringify(state).length
// 126
encodeURIComponent(JSON.stringify(state))
// "%5B%7B%22mode%22%3A%22filter%22%2C%22filters%22%3A%5B%7B%22name%22%3A%22Year%22%2C%22value%22%3A2014%2C%22condition%22%3A%22eq%22%7D%5D%7D%2C%7B%22mode%22%3A%22order%22%2C%22col%22%3A%22population%22%2C%22dir%22%3A%22desc%22%7D%5D"
encodeURIComponent(JSON.stringify(state)).length
// 230 - раздутие x2

bse64 comes to the rescue! Yes, it inflates the line too. Yes, it's not completely URL-safe either. But the result is much better!

encodeURIComponent(btoa(JSON.stringify(state)))
// "W3sibW9kZSI6ImZpbHRlciIsImZpbHRlcnMiOlt7Im5hbWUiOiJZZWFyIiwidmFsdWUiOjIwMTQsImNvbmRpdGlvbiI6ImVxIn1dfSx7Im1vZGUiOiJvcmRlciIsImNvbCI6InBvcHVsYXRpb24iLCJkaXIiOiJkZXNjIn1d"
encodeURIComponent(btoa(JSON.stringify(state))).length
// 168 

In fact, with base64 in the URL it will fit almost 1.5 times more than the pipeline. Wonderful. Yes, through our eyes the URL now looks like a mountain of garbage, and it cannot be edited by hand – but let’s be honest, working with %%2F%5%%% is also not super convenient, and we should not consider this a base scenario.

We wrap this whole serialization in very beautiful signal (like a hook, only for solidjs) to synchronize the application state with the URL, done! All the examples from the section about graphics worked through a sharable URL, but here’s another one for you – all history of indicators for Russia. With this URL you can refresh the page, restart the browser, send it to friends – for free and without SMS!


Today I told you why I chose Charts.js for visualization, what kind of charts I can build, and shared a fascinating story about URL serialization, JSON and base64. I hope you found it interesting! The best help for the project is to vote for this article, put star on github and, most importantly, use hippotable at work or at home and share feedback.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *