How to show a million buildings on the map – and not break the browser

In 2GIS, we accumulate a huge amount of geodata that millions of users interact with daily. By analyzing them, we can get valuable information and find important ideas for urban development. This data is also useful for organizations.

To help businesses and municipal organizations, we have created 2GIS PRO is a tool for GPU analytics, with the ability to visualize a huge amount of data on the map in the form of charts and graphs.

Let’s tell you how we get such a picture, how it all works under the hood, and see what your browser is capable of, because it has to display hundreds of thousands of objects at the same time.

At the start, we had only 2 main requirements from future users:

  • “I want to filter all objects by all attributes that interest me and see aggregated information.”

  • “I want to see all the buildings of Moscow and the region as hexagons on the map and color them depending on the number of floors.”

And now let’s talk about this in more detail.

How to filter and aggregate everything

This is a relatively simple task, you just need to choose the right database, index everything in the right way and write a couple of queries.

We have chosen for this purpose elasticsearchhe had almost everything that was required out of the box:

  • Ability to store millions of documents

  • Fast indexing by arbitrary attributes, including geosearch

  • Good data aggregation

It only remained to lay out the json description of the aggregates – to describe in general terms the view in which you want to show data on the client for the selected type of objects.

As a result, we get this picture

We can change this representation simply through the database, select various graphs and diagrams, collect aggregates in the context of the required attributes. Plus, we can do it on the fly.

A piece of configuration, based on which the result above is obtained:

[
    {
      "caption": "Total count of buildings",
      "agg_type": "value_count",
      "group_id": "building_counts",
      "placement": "header"
    },
    {
      "filter": [
        {
          "tag": "purpose_group",
          "type": "number",
          "value": "1000001"
        }
      ],
      "caption": "Administrative and Commercial",
      "agg_type": "value_count",
      "group_id": "building_counts",
      "placement": "chart"
    }
]

How to show a million hexagons

Let’s analyze the second requirement for the product – customers want to see all the buildings in Moscow and the region in the form of hexagons and color them depending on the number of storeys. Wait, but we are talking about millions of objects!

In general, to display the map, we have a cool mapGL engine, which can show all our geo objects on the map and does it quite quickly. But it has a number of features: the data is quite static, it is cut into tiles, and a relatively small set of data packed into tiles is loaded onto the client and gets into the viewport. If you want to show rarely-changing data and have time for their preliminary preparation and slicing, use this engine.

But if you need dynamics and flexible customization of visualizations in the form of hexagons, grids, hitmaps, then another mechanism will be required. To solve this problem, we took the library deckgl – a very cool open source tool that gave us almost everything we needed out of the box.

There was one problem – there are still millions of objects. To draw the same hexagons beautifully, you need to load all the data, calculate the minimums / maximums, heights, palette and all other parameters that depend on the data attributes. And then – draw all this wealth on top of our base map. And it is desirable to do this not on top-end hardware, with some sane level of memory consumption, and so that it does not slow down and you can use it and do the necessary analysis.

Data preparation

Wait a minute, but it’s not in vain that they came up with paging, aggregation, simplification, tiling and many other buzzwords! All this is necessary to reduce the number of objects to some sane amount, usually a few tens, a maximum of hundreds. It is impossible to give a million objects to the front, this simply will not work.

Or will it?

Yes, if it does, then loading and processing data will have to wait a minute (or hours, if you’re not lucky). Everything will slow down hard – users will be dissatisfied. In a word, everything is unreal.

Or real?

The first experiments showed that this is real. But there are nuances

  • This amount of data needs to be removed from the elastic. And in our case, you also need to filter the data according to the specified criteria. In other words, the selection differs from request to request – and do not cache it right away.

  • You can’t just give json into 2 million objects. This is very long, from several tens of seconds to several minutes. Even in streaming read-compress-send mode.

A simple and obvious solution: we prepare the data for sending in the background, and temporarily send a clustered result to the client in several hundred or even thousands of objects. There is no rocket science here: if you want a lot of data, you have to wait.

The data preparation itself is simply a task that selects data from storage, puts it into a gzip file, and sends this file to file storage (in our case, s3). And on the client, at the next request, it sends a ready-made file of several kilobytes or megabytes.

This happens relatively quickly, the data is already ready, you don’t need to get it from the database, you don’t need to compress it, just stream the file to the output stream and that’s it.

And then – the magic of the frontend.

Preparing data for working in the browser

For an example of what volumes we need to work with, let’s take a data set with all the buildings in Moscow. This is, for a moment, 170 thousand objects that look like this:

{
    "id": "70030076129543595",
    "point": {
        "lon": 37.874927,
        "lat": 55.739908
    },
    "values": {
        "area": 9,
        "building_id": "70030076129543595",
        "floors_count": 1
    }
}

The number of keys in values ​​can be varied, from 2 to N fields. This is the number of keys from what information we have for each building: number of floors, year of construction, number of entrances, type of building, number of people living / working, etc.

This layer is useful as an underlay to mark the area of ​​analytics, since the buildings naturally form the desired contour. And on top, you can overlay a few more layers with analyzed data, for example, with demand, which will give several tens of thousands more objects. In total, having several hundred thousand objects in one project is not an anomaly, but quite a standard user scenario.

How to work with such volumes of data and not block the interface

Most modern browsers can move some of the work to a separate Background Thread via Web Workers API. Having studied all the possibilities, we realized that we can absolutely painlessly take out all the work with obtaining and preparing data in this layer.

For convenient work with WebWorker, we use the library ComLink from the Google Chrome development team.

The WebWorker’s interface is like this:

type PromisifyFn<T extends (...args: any[]) => any> = (
...args: Parameters<T>
) => Promise<ReturnType<T>>;

const worker = {
   requestItemValues: async (assetId: string, services: Services) {
       // В данном методе мы:
       // Запрашиваем данные через Axios (библиотека http запросов)
       // Складываем их в хранилище данных (кэширование на клиенте)
   }
   getDeckData: (layerId: string) {
       // В данной функции мы подготавливаем данные для работы в deck.gl
   }
   // … другие вспомогательные методы
}

type ProWebWorker = typeof worker;

type ProWorker = {
   requestItemValues: PromisifyFn<ProWebWorker[‘requestItemValues’]>,
   getDeckData: PromisifyFn<ProWebWorker['getDeckData']>
}

As you can see, we have 2 main methods that first implement a data request, and then receive it in the desired format.

The logic for receiving data from the server is not much different from how we would do it in a regular application, so let’s move on to the more interesting part. Let’s talk about data preparation.

binary data

At the first stage of exploring work with background data processing, we only requested data and parsed JSON through JSON.parse, while the rest of the operations were done in the main thread.

Soon we had a very large demand dataset and the application started blocking the main thread again. The guys from the WebGL map team said that they solved a similar problem through transition to binary data.

It turned out that deck.gl has a friendly interface for binary data. This allows us to get the most out of the WebWorker. The fact is that passing data in the form of typed arrays from a background thread works much more efficiently than passing it in any other format, and we do not need to additionally transform the data in the main thread.

Also, when transferring binary data, it is important to use Transferable Objects not to waste too much memory. The Comlink library has a special method transfer.

Let’s see how the binary data format looks on the example of the Grid visualization

export function getGridData(data: GridHexLayerData) {
  return {
    length: data.positions.length / 2,
    attributes: {
      // В значениях у нас будет массив Float32Array со значениями координат
      // [37.575541, 55.724986, 36.575541, 54.724986]
      // Size же говорит нам сколько нужно взять элементов массива 
      // чтобы получить координаты одного элемента
      getPosition: { value: data.positions, size: 2 },
      // В цветах у нас Uint8ClampedArray 
      // [255, 255, 255, 255, 255, 255, 255, 255]
      // Тут мы уже берем 4 элемента чтобы превратить это в RGBA
      getFillColor: { value: data.colors, size: 4 },
    },
  };
}

It is not difficult to see that the format is sufficient hard to read. But it is precisely due to it that we get a high speed of displaying data on top of the map. The developer can always get the raw data within the WebWorker with the familiar JSON interface.

Data aggregation

Some visualizations (grid, hex, h3) require pre-aggregation. In the initial data, we have complete sets without any transformations on the server. This is necessary in order not to re-request data from the server each time the rendering method or its parameters change.

As part of the aggregation, we turn our thousands of objects into a collection of cells that will be displayed on the map, and also collect various statistical information. For example, the minimum and maximum value of a user-selected color base among all points. We need such data so that we can build a value legend.

In addition, we store the values ​​of various attributes for quick access to them from tooltips and object cards.

Performance

At the beginning of the story about the Web Worker, I gave an example of two layers, consisting of about 200,000 objects. How fast can we get, prepare and display this data?

As you can see in the demo, the download does not exceed 3 seconds. At the same time, a layer with a heat map (data on demand) appeared at the time the map was displayed. If we take all the real projects that we studied, the maximum duration on an average office laptop was 12 seconds, with a data volume of several million points.

No unnecessary work on the server, fast client background loading and data preparation, a binary format and an efficient GPU engine in deck.gl allow you to display an almost “limitless” amount of data!

What else can you try

We are quite happy with the current solution. It seems that he managed to elegantly get out of a situation that seemed hopeless.

In general, I would like to try to “stretch” the process of obtaining data, for example, using streaming rendering in NextJS and other client libraries. I think it will improve the user experience even more. And to make the data immediately in binary format when preparing the data on the server… But, as you know, this is a completely different story.

Thanks

Kirill Kaisarov helped me with writing the part of the frontend, for which many thanks to him!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *