PostgreSQL and OpenStreetMap

When a person used to say that he controls the whole world, he was usually placed in the next room with Bonaparte Napoleon. I hope that these times are in the past and everyone can analyze the geodata of the entire Earth and get answers to their global questions in minutes and seconds. I posted openstreetmap_h3 – its own project that allows you to perform geoanalytics on data from OpenStreetMap in PostGIS or in a query engine that can work with Apache Arrow / Parquet.

First of all, I say hello to the haters and skeptics. What I have developed is really unique and solves the problem of transforming and analyzing geodata using the usual and familiar tools available to every analyst and data science specialist without bigdata, GPGPU, FPGA. What looks easy to use and code now is my personal project where I have invested my vacations, weekends, sleepless nights and tons of personal time over the last 3 years. Maybe I will share the background of the project and the rake that I went through, but first I will still describe the end result.

The first post does not pretend to be a monograph, I will start with a brief overview. I plan to share my experience at conferences and in publications, and although the topic is niche, it will be useful for many projects working with geodata from OSM. For programmers and users who are not familiar with the OSM toolkit and data model, many things may seem a little strange. Until I delve into these basics. I will only note that despite his criticism, OpenStreetMap itself and services are developing and the amount of data is increasing year by year. In some regions of the world, the completeness and quality of OSM data is better than that of other projects. Therefore, we simply accept the data model as it is – flexible and not restricting the user in markup, complicating solutions for analytics and data visualization due to the design features of data structures.

There are specialized columnar, distributed opensource databases with which I also experimented, but still PostgreSQL + PostGIS turned out to be the most reliable and tested solutions available at the moment. Well, the icing on the cake: the same data schema in Postgresql can be loaded into column storage with just one parameter CitusDB inside the same PostgreSQL.

Why this project and what did not suit the existing solutions

Attempts to load the planet dump into the PostgreSQL database were long and painful… Utility Osmosis written in java, has rich functionality for working with OSM data, including the ability to load this data into a PostgreSQL database with the structure pgSnapshot for analytics. This project is no longer developed in terms of functionality, only support. Allows you to convert data from PBF to tsv files for copying to the database and there are scripts for creating a database schema. The pgSnapshot lossless scheme, which allows you to analyze all available information without loss, build routes, etc. But the main disadvantage is the huge nodes, ways table for hundreds of gigabytes, which does not allow parallel loading of data into the table while maintaining the physical order of the records, long construction of giant indexes, long full scan on a giant table. Osm2pgsql, Imposm allows you to load data into PostGIS, data import occurs by mapping with the loss of part of the data, is not suitable for routing and is more oriented for visualizing tiles or working with a subset of transformed data into user-defined tables. Osmium export makes scripting faster than Osmosis, but the functionality of PostGIS export is limited and the data model is more primitive in structure than Osmosis.

When the workhorse fails, it’s time to harness the elephant (PostgreSQL+PostGIS)

I would like to have data in PostgreSQL in a format as close as possible to pgSnapshot, with a fast mechanism for loading data from PBF and the possibility of parallel loading into the database. It would be ideal if the data were partitioned so that within one partition there were objects located close to each other and the partition grid was approximately the same in radius, both at the equator and at the poles of the planet. Of the systems of geopartitioning, I considered options – division by administrative borders / countries, partitioning with a rectangular grid, hierarchical systems S2 and H3. And the choice was made in favor of the latter, since this hierarchical indexing system H3 has advantages in maintaining distances (approximately the same segment radius anywhere in the world), great for routing tasks, libraries for working with H3 are well supported and constantly evolving and available for PostgreSQL / JVM.

As a result, I developed a Java utility and designed a pgSnapshot compatible database schema that uses PostgreSQL declarative data partitioning based on H3 ranges of the object coordinate index. Developed a utility that can automatically create scripts for the database schema, prepare data for loading from PBF into tables and load them into the database multithreaded.

It was a path of numerous experiments, starting with modifying the Osmosis code and osm-parquetizer. Gathered the coordinates of the line points using a memory mapped file for a hundred gigabytes in Java code, then to throw out this code of my own and reuse the add-locations-to-ways functionality from osmium – the most optimized solution available that does the same much faster. In addition, osmium can collect multipolygons, which is much faster and more reliable than doing it at the database level, here I also decided to rely on a solution tested by the community.

If you are still drawn to bigdata, we call on Apache Arrow to help

We now have geodata partitioned by H3 indexes in PostgreSQL and all the reliability and quality of this ecosystem. But if we need to transfer the same partitioned data to the Apache Spark or Hadoop ecosystem, then we would like to immediately get the data from the PBF in the Arrow / Parquet format without the intermediate PostgreSQL. Considering your previous optimization experience osm parquetizer, I added the ability to export geodata through the Apache Arrow serialization format to my utility. To get Parquet files partitioned by H3 index, an additional post-processing step of the resulting set of files is needed (for the time being, I am doing this with an additional python post-processing script). The advantages of my utility over osm-parquetizer are greater parallelism in the transformation of PBF data, collection of coordinates for lines, and the presence of an H3 geo-index for nodes, ways. One issue of choice for geoanalytics in parquet is how to store the lines. Some of the alternative solutions save them as an array of points, but for now I decided to encode the lines in WKB format.

Openstreetmap_h3 is written in Java

As I mentioned above, this project evolved from Osmosis and osm-parquetizer and was rewritten from scratch, taking into account all the rake that I collected in these Java projects. The project build script on maven, and the application itself is written as a command line utility:

The main logic of the project
package com.github.isuhorukov.osm.pgsnapshot;

import com.beust.jcommander.JCommander;
import com.github.isuhorukov.osm.pgsnapshot.model.*;
import com.github.isuhorukov.osm.pgsnapshot.model.statistics.BlockStat;
import com.github.isuhorukov.osm.pgsnapshot.model.statistics.MultipolygonTime;
import com.github.isuhorukov.osm.pgsnapshot.model.statistics.PbfStatistics;
import com.github.isuhorukov.osm.pgsnapshot.model.statistics.Stat;
import com.github.isuhorukov.osm.pgsnapshot.model.table.StatType;
import com.github.isuhorukov.osm.pgsnapshot.model.table.TableStat;
import com.github.isuhorukov.osm.pgsnapshot.util.CompactH3;
import com.github.isuhorukov.osm.pgsnapshot.util.PartitionSplitter;
import com.google.common.util.concurrent.MoreExecutors;
import com.uber.h3core.H3Core;
import com.uber.h3core.util.LatLng;
import net.postgis.jdbc.geometry.LineString;
import net.postgis.jdbc.geometry.LinearRing;
import net.postgis.jdbc.geometry.Point;
import net.postgis.jdbc.geometry.Polygon;
import net.postgis.jdbc.geometry.binary.BinaryWriter;
import org.apache.arrow.memory.ArrowBuf;
import org.apache.arrow.memory.BufferAllocator;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.*;
import org.apache.arrow.vector.complex.ListVector;
import org.apache.arrow.vector.complex.MapVector;
import org.apache.arrow.vector.complex.impl.UnionListWriter;
import org.apache.arrow.vector.complex.impl.UnionMapWriter;
import org.apache.arrow.vector.ipc.ArrowFileWriter;
import org.apache.arrow.vector.types.FloatingPointPrecision;
import org.apache.arrow.vector.types.pojo.ArrowType;
import org.apache.arrow.vector.types.pojo.Field;
import org.apache.arrow.vector.types.pojo.FieldType;
import org.apache.arrow.vector.types.pojo.Schema;
import org.apache.commons.io.IOUtils;
import org.geotools.geometry.jts.JTS;
import org.geotools.referencing.CRS;
import org.h2gis.functions.spatial.properties.ST_IsClosed;
import org.locationtech.jts.algorithm.MinimumBoundingCircle;
import org.locationtech.jts.geom.*;
import org.locationtech.jts.io.WKBWriter;
import org.opengis.referencing.FactoryException;
import org.opengis.referencing.crs.CoordinateReferenceSystem;
import org.opengis.referencing.operation.TransformException;
import org.openstreetmap.osmosis.core.container.v0_6.EntityContainer;
import org.openstreetmap.osmosis.core.container.v0_6.NodeContainer;
import org.openstreetmap.osmosis.core.container.v0_6.RelationContainer;
import org.openstreetmap.osmosis.core.container.v0_6.WayContainer;
import org.openstreetmap.osmosis.core.domain.v0_6.Entity;
import org.openstreetmap.osmosis.core.domain.v0_6.RelationMember;
import org.openstreetmap.osmosis.core.domain.v0_6.Tag;
import org.openstreetmap.osmosis.core.domain.v0_6.WayNode;
import org.openstreetmap.osmosis.pbf2.v0_6.impl.PbfBlobDecoder;
import org.openstreetmap.osmosis.pbf2.v0_6.impl.PbfBlobDecoderListener;
import org.openstreetmap.osmosis.pbf2.v0_6.impl.RawBlob;
import org.openstreetmap.osmosis.pgsnapshot.v0_6.impl.MemberTypeValueMapper;

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.stream.Collectors;

import static java.util.stream.Collectors.toList;

public class OsmPbfTransformation {

    public static final String NODES_DIR = "nodes";
    public static final String RELATIONS_DIR = "relations";
    public static final String WAYS_DIR = "ways";
    public static final String MULTIPOLYGON_DIR = "multipolygon";
    public static final String SQL_DIR = "sql";
    public static final String ARROW_DIR = "arrow";
    public static final String IMPORT_RELATED_METADATA_DIR = "import_related_metadata";
    public static final String STATIC_DIR = "static";

    public static void main(String[] args) throws Exception{

        CliParameters parameters = parseCliArguments(args);
        if (parameters == null){
            return;
        }
        long commandStartTime = System.currentTimeMillis();

        String sourceFilePath = parameters.sourceFilePath;
        File sourcePbfFile = new File(sourceFilePath);
        if(!sourcePbfFile.exists() || sourcePbfFile.length()==0){
            throw new IllegalArgumentException("Input pbf should exists and should be non empty");
        }

        Splitter.Blocks blocks =ExternalProcessing.enrichSourcePbfAndSplitIt(sourcePbfFile);

        File inputDirectory = new File(blocks.getDirectory());
        File[] files = inputDirectory.listFiles();
        Arrays.sort(Objects.requireNonNull(files));
        File resultDirectory = prepareResultDirectories(new File(inputDirectory.getParent(),
                                                            resultDirectoryNameFromSource(inputDirectory)),
                parameters.savePostgresqlTsv, parameters.saveArrow);

        copyOsmiumSettings(resultDirectory);
        if(parameters.savePostgresqlTsv) {
            copyResources(resultDirectory, parameters.columnarStorage);
        }

        long processingStartTime = System.currentTimeMillis();

        final H3Core h3Core = H3Core.newInstance();


        ExecutorService saveExecutorService = getExecutorService(parameters.workers);
        ExecutorService executorService = getExecutorService(parameters.workers);

        Map<Long, BlockStat> blockStat= new ConcurrentHashMap<>();
        AtomicInteger currentBlockToSave= new AtomicInteger(0);
        for(File blockFile: files){

            executorService.submit(() -> {
                long threadStart = System.currentTimeMillis();
                Long blockNumber = Long.parseLong(blockFile.getName());
                if(blockNumber%1000==0){
                    System.out.println(blockNumber);
                }
                RawBlob rawBlob;
                try {
                    FileInputStream blobInputStream = new FileInputStream(blockFile);
                    rawBlob = new RawBlob("OSMData", IOUtils.toByteArray(blobInputStream));
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }

                PbfBlobDecoder blobDecoder = new PbfBlobDecoder(rawBlob, new PbfBlobDecoderListener() {
                    @Override
                    public void complete(List<EntityContainer> decodedEntities) {
                        long blockStartTime = System.currentTimeMillis();
                        GeometryFactory geometryFactory = new GeometryFactory();
                        MemberTypeValueMapper memberTypeValueMapper = new MemberTypeValueMapper();
                        Map<Short, StringBuilder> csvResultPerH33 =new HashMap<>();
                        BinaryWriter binaryWriter = new BinaryWriter();
                        ArrayList<ArrowNodeOrWay> arrowNodeOrWays = new ArrayList<>();
                        ArrayList<ArrowRelation> arrowRelations = new ArrayList<>();
                        WKBWriter wkbWriter = new WKBWriter();
                        final CoordinateReferenceSystem coordinateReferenceSystem;
                        try {
                            coordinateReferenceSystem = CRS.decode("EPSG:" + Serializer.SRID);
                        } catch (FactoryException e) {
                            throw new RuntimeException(e);
                        }

                        Map<Short, Stat> nodeStat =new HashMap<>();
                        long nodeRecords = decodedEntities.stream().
                                filter(entityContainer -> entityContainer instanceof NodeContainer).
                            map(entityContainer -> ((NodeContainer) entityContainer).getEntity()).map(entity -> {
                                prepareNodeData(csvResultPerH33, binaryWriter, arrowNodeOrWays,
                                        nodeStat, entity, h3Core, parameters.collectOnlyStat, parameters.saveArrow, parameters.savePostgresqlTsv);
                                return null;
                            }).filter(Objects::isNull).count();

                        Map<Short, Stat> wayStat =new HashMap<>();
                        long wayRecords = decodedEntities.stream().
                                filter(entityContainer -> entityContainer instanceof WayContainer).
                            map(entityContainer -> ((WayContainer) entityContainer).getEntity()).map(entity -> {
                            prepareWayData(geometryFactory, csvResultPerH33, binaryWriter,  wkbWriter,
                                    arrowNodeOrWays, wayStat, entity, h3Core,
                                    parameters.scaleApproximation, parameters.collectOnlyStat, parameters.skipBuildings,
                                    coordinateReferenceSystem, parameters.saveArrow, parameters.savePostgresqlTsv);
                                return null;
                            }).filter(Objects::isNull).count();
                        BlockStat blockStatistic = new BlockStat(blockNumber);
                        blockStatistic.setThreadStart(threadStart);
                        if(!nodeStat.isEmpty()) {
                            blockStatistic.setNodeStat(nodeStat);
                        }
                        if(!wayStat.isEmpty()) {
                            blockStatistic.setWayStat(wayStat);
                        }
                        long relationCount = decodedEntities.stream().
                            filter(entityContainer -> entityContainer instanceof RelationContainer).
                            map(entityContainer -> ((RelationContainer) entityContainer).getEntity()).
                            map(entity -> {
                                if(!parameters.collectOnlyStat) {
                                    long relationId = entity.getId();
                                    if(parameters.savePostgresqlTsv){
                                        StringBuilder relationCsv = csvResultPerH33.computeIfAbsent((short)0, h33Key -> new StringBuilder());
                                        Serializer.serializeRelation(relationCsv, relationId, entity.getTags());
                                    }

                                    ArrowRelation arrowRelation = null;
                                    if(parameters.saveArrow){
                                        arrowRelation = new ArrowRelation(relationId, TagsUtil.tagsToMap(entity.getTags()));
                                        arrowRelations.add(arrowRelation);
                                    }

                                    List<RelationMember> relationMembers = entity.getMembers();
                                    for(int sequenceId=0; sequenceId<relationMembers.size();sequenceId++){
                                        RelationMember relationMember = relationMembers.get(sequenceId);
                                        long memberId = relationMember.getMemberId();
                                        String memberType = memberTypeValueMapper.getMemberType(relationMember.getMemberType());
                                        String memberRole = relationMember.getMemberRole();
                                        if(parameters.saveArrow){
                                            arrowRelation.getRelationMembers().add(
                                                    new ArrowRelationMember(memberId, memberType.charAt(0), memberRole));
                                        }
                                        if(parameters.savePostgresqlTsv){
                                            StringBuilder relationMembersCsv = csvResultPerH33.computeIfAbsent((short)1, h33Key -> new StringBuilder());
                                            Serializer.serializeRelationMembers(relationMembersCsv, relationId,
                                                    memberId, memberType, memberRole, sequenceId);
                                        }
                                    }
                                }
                                return null;
                            }).count();
                        long multipolygonCount = relationCount==0 ? 0 : decodedEntities.stream().
                            filter(entityContainer -> entityContainer instanceof RelationContainer).
                            map(entity -> {
                                for(Tag tag: entity.getEntity().getTags()){
                                    if("type".equals(tag.getKey()) && "multipolygon".equals(tag.getValue())){
                                        return 1;
                                    }
                                }
                                return 0;
                            }).mapToLong(Integer::longValue).sum();
                        long relationMemberCount = decodedEntities.stream().
                                filter(entityContainer -> entityContainer instanceof RelationContainer).
                                map(entityContainer -> ((RelationContainer) entityContainer).getEntity()).
                                mapToLong(value -> value.getMembers().size()).sum();
                        blockStatistic.setNodeCount(nodeRecords);
                        blockStatistic.setWayCount(wayRecords);
                        blockStatistic.setRelationCount(relationCount);
                        blockStatistic.setRelationMembersCount(relationMemberCount);
                        blockStatistic.setMultipolygonCount(multipolygonCount);
                        blockStat.put(blockNumber, blockStatistic);
                        blockStatistic.setProcessingTime(System.currentTimeMillis()-blockStartTime);

                        if(!parameters.collectOnlyStat) {
                            if(parameters.saveArrow){
                                long startSaveTime = System.currentTimeMillis();
                                if(!arrowNodeOrWays.isEmpty()){
                                    saveArrowNodesOrWays(arrowNodeOrWays, blockNumber, new File(resultDirectory,ARROW_DIR));
                                }
                                if(!arrowRelations.isEmpty()){
                                    saveArrowRelations(arrowRelations, blockNumber, new File(resultDirectory, ARROW_DIR));
                                }
                                blockStatistic.setSaveTime(System.currentTimeMillis()-startSaveTime);
                            }
                            if(parameters.savePostgresqlTsv){
                                saveDataOnlyInOneThread(csvResultPerH33, nodeRecords, wayRecords,
                                    blockStatistic, relationCount,
                                    currentBlockToSave, blockNumber, resultDirectory, saveExecutorService);
                            }
                        }
                    }

                    @Override
                    public void error() {
                        System.out.println("ERROR in block "+blockNumber);
                    }
                });
                blobDecoder.run();
                long threadTime = System.currentTimeMillis() - threadStart;
                blockStat.get(blockNumber).setThreadTime(threadTime);
            });
        }
        executorService.shutdown();
        executorService.awaitTermination(2,TimeUnit.DAYS);
        saveExecutorService.shutdown();//stop executor only when all tasks in processing executor is finished

        List<BlockStat> blockStatistics = new ArrayList<BlockStat>(blockStat.values());
        long multipolygonCount = blockStatistics.stream().map(BlockStat::getMultipolygonCount).mapToLong(Long::longValue).sum();
        long dataProcessingTime = System.currentTimeMillis() - processingStartTime;
        System.out.println(files.length+" "+" time "+dataProcessingTime);
        System.out.println("diff between total and processing " + blockStatistics.stream().map(blockStat1 -> blockStat1.getThreadTime()-blockStat1.getProcessingTime()).mapToLong(Long::longValue).sum());
        System.out.println("total thread time "+ blockStatistics.stream().map(BlockStat::getThreadTime).mapToLong(Long::longValue).sum());
        System.out.println("total processing time "+ blockStatistics.stream().map(BlockStat::getProcessingTime).mapToLong(Long::longValue).sum());
        System.out.println("total save time "+ blockStatistics.stream().map(BlockStat::getSaveTime).mapToLong(Long::longValue).sum());
        System.out.println("total waiting for save time "+ blockStatistics.stream().map(BlockStat::getWaitingForSaveTime).mapToLong(Long::longValue).sum());
        System.out.println("thread max time "+ blockStatistics.stream().map(BlockStat::getThreadTime).mapToLong(Long::longValue).max().orElse(0));
        System.out.println("processing max time "+ blockStatistics.stream().map(BlockStat::getProcessingTime).mapToLong(Long::longValue).max().orElse(0));
        System.out.println("nodes "+ blockStatistics.stream().map(BlockStat::getNodeCount).mapToLong(Long::longValue).sum());
        System.out.println("ways "+ blockStatistics.stream().map(BlockStat::getWayCount).mapToLong(Long::longValue).sum());
        System.out.println("relations "+ blockStatistics.stream().map(BlockStat::getRelationCount).mapToLong(Long::longValue).sum());
        System.out.println("relation members "+ blockStatistics.stream().map(BlockStat::getRelationMembersCount).mapToLong(Long::longValue).sum());
        System.out.println("multipolygon count "+ multipolygonCount);


        if(!parameters.collectOnlyStat) {
            savePartitioningScripts(resultDirectory, parameters.scriptCount,
                    parameters.thresholdPercentFromMaxPartition, blockStatistics, parameters.columnarStorage);
        }

        MultipolygonTime multipolygonTime = new MultipolygonTime(); //multipolygonCount calculation is only one reason why this generator at the end of process
        if(!parameters.collectOnlyStat) {
            multipolygonTime = ExternalProcessing.prepareMultipolygonDataAndScripts(sourcePbfFile,
                    resultDirectory, parameters.scriptCount, multipolygonCount);
        }
        PbfStatistics statistics = new PbfStatistics(blockStatistics);
        statistics.setMultipolygonCount(multipolygonCount);
        statistics.setDataProcessingTime(dataProcessingTime);
        statistics.setAddLocationsToWaysTime(blocks.getAddLocationsToWaysTime());
        statistics.setPbfSplitTime(blocks.getPbfSplitTime());
        statistics.setMultipolygonExportTime(multipolygonTime.getMultipolygonExportTime());
        statistics.setSplitMultipolygonByPartsTime(multipolygonTime.getSplitMultipolygonByPartsTime());
        statistics.setTotalTime(System.currentTimeMillis()-commandStartTime);

        saveStatistics(resultDirectory, statistics);

    }
}

All this will allow in the future to easily transfer processing to Apache Spark, which will further increase the parallelism of processing PBF dump sections in a distributed environment. But again, you need to complete the prototype and measure performance – is the game worth the candle. In my opinion, this approach would make sense for generating parquet files, but hardly for PostgreSQL TSV files.

First steps with Openstreetmap_h3

So, all we need to get started is a PostgreSQL 14 database, Openstreetmap project data, and a regular laptop. In my case it’s Dell Latitude 7320: i7-1165G7@2.80GHz, 16GB RAM with 2TB NVMe drive. The project works both with a dump of the whole world in PBF format, and with regional uploads c Geofabrik

Clone and build the project:

git clone https://github.com/igor-suhorukov/openstreetmap_h3.git
cd openstreetmap_h3
mvn install

The project requires Docker to work. On startup, a list of available options and their descriptions are displayed:

~/dev/projects/oss_contrib/openstreetmap_h3$ java -jar target/osm-to-pgsnapshot-schema-ng-1.0-SNAPSHOT.jar
The following option is required: [-source_pbf]
Usage: <main class> [options]
  Options:
  * -source_pbf
      Source path for OpenStreetMap data in PBF format
    -collect_only_statistics
      Collect only statistics from data - partition distribution
      Default: false
    -columnar_storage
      Use columnar storage in PostgreSql tables for nodes/ways/multipolygon
      Default: false
    -data_partition_ratio
      Filling ratio from maximum size of partition. This parameter change 
      PostgreSQL partitions count
      Default: 0.48
    -help
      Information about command line parameters
    -pg_script_count
      Script count for PostgreSQL parallel COPY
      Default: 4
    -result_in_arrow
      Save data transforming and enrichment in Apache Arrow format
      Default: false
    -result_in_tsv
      Save result data in TabSeparatedValue format for PostgreSQL COPY
      Default: true
    -scale_approx_calc
      Approximate scale calculation. Value 'false' - distance in meter
      Default: false
    -skip_buildings
      Skip any ways with 'building' tag
      Default: false
    -worker_threads
      Worker threads count for data processing
      Default: 4

After that, download OSM PBD dump planet or any one country with Geofabric

As a result, we get a set of scripts for parallel loading into the database:
~/dev/map/thailand/thailand-latest_loc_ways$ tree
.
├── import_related_metadata
│   ├── osm_file_block_content.tsv
│   ├── osm_file_block.tsv
│   └── osm_file_statistics.tsv
├── multipolygon
│   ├── multipolygon_aa
│   ├── multipolygon_ab
│   ├── multipolygon_ac
│   ├── multipolygon_ad
│   └── multipolygon_ae
├── nodes
│   ├── 16713.tsv
│   ├── 16717.tsv
│   ├── 16744.tsv
│   ├── 16745.tsv
│   ├── 16747.tsv
│   ├── 16748.tsv
│   ├── 16749.tsv
│   ├── 25600.tsv
│   ├── 25601.tsv
│   ├── 25606.tsv
│   ├── 25620.tsv
│   ├── 25622.tsv
│   ├── 25632.tsv
│   ├── 25634.tsv
│   ├── 25636.tsv
│   ├── 25638.tsv
│   ├── 25648.tsv
│   ├── 25649.tsv
│   ├── 25650.tsv
│   ├── 25651.tsv
│   ├── 25652.tsv
│   ├── 25653.tsv
│   ├── 25654.tsv
│   ├── 25728.tsv
│   ├── 25729.tsv
│   ├── 25730.tsv
│   ├── 25731.tsv
│   ├── 25732.tsv
│   ├── 25733.tsv
│   ├── 25734.tsv
│   ├── 25744.tsv
│   ├── 25745.tsv
│   ├── 25748.tsv
│   ├── 25749.tsv
│   ├── 25756.tsv
│   ├── 25760.tsv
│   ├── 25761.tsv
│   ├── 25762.tsv
│   ├── 25763.tsv
│   ├── 25764.tsv
│   ├── 25765.tsv
│   ├── 25766.tsv
│   ├── 25774.tsv
│   ├── 25776.tsv
│   ├── 25777.tsv
│   ├── 25778.tsv
│   ├── 25779.tsv
│   ├── 25780.tsv
│   ├── 25781.tsv
│   ├── 25782.tsv
│   ├── 25873.tsv
│   ├── 25880.tsv
│   ├── 25881.tsv
│   ├── 25882.tsv
│   ├── 25883.tsv
│   ├── 25884.tsv
│   ├── 25885.tsv
│   ├── 25886.tsv
│   ├── 25985.tsv
│   ├── 25989.tsv
│   ├── 25992.tsv
│   ├── 25993.tsv
│   ├── 25994.tsv
│   ├── 25995.tsv
│   ├── 25996.tsv
│   ├── 25998.tsv
│   ├── 26000.tsv
│   ├── 26001.tsv
│   ├── 26003.tsv
│   ├── 26008.tsv
│   ├── 26009.tsv
│   ├── 26010.tsv
│   ├── 26011.tsv
│   ├── 26012.tsv
│   ├── 26013.tsv
│   ├── 26014.tsv
│   ├── 26019.tsv
│   ├── 26026.tsv
│   ├── 26029.tsv
│   └── 26030.tsv
├── relations
│   ├── 00000.tsv
│   └── 00001.tsv
├── sql
│   ├── nodes_import_000.sql
│   ├── nodes_import_001.sql
│   ├── nodes_import_002.sql
│   ├── nodes_import_003.sql
│   ├── ways_import_000.sql
│   ├── ways_import_001.sql
│   ├── ways_import_002.sql
│   ├── ways_import_003.sql
│   ├── y_multipoly_aa.sql
│   ├── y_multipoly_ab.sql
│   ├── y_multipoly_ac.sql
│   ├── y_multipoly_ad.sql
│   └── y_multipoly_ae.sql
├── static
│   ├── database_after_init.sql
│   ├── database_init.sql
│   ├── h3_poly.tsv.gz
│   ├── multipolygon.sql
│   ├── multipolygon_tables.sql
│   └── osmium_export.json
└── ways
    ├── 16717.tsv
    ├── 16744.tsv
    ├── 16745.tsv
    ├── 16747.tsv
    ├── 16748.tsv
    ├── 16749.tsv
    ├── 25620.tsv
    ├── 25622.tsv
    ├── 25634.tsv
    ├── 25638.tsv
    ├── 25648.tsv
    ├── 25649.tsv
    ├── 25650.tsv
    ├── 25651.tsv
    ├── 25652.tsv
    ├── 25653.tsv
    ├── 25654.tsv
    ├── 25728.tsv
    ├── 25729.tsv
    ├── 25730.tsv
    ├── 25731.tsv
    ├── 25732.tsv
    ├── 25733.tsv
    ├── 25734.tsv
    ├── 25744.tsv
    ├── 25745.tsv
    ├── 25748.tsv
    ├── 25749.tsv
    ├── 25756.tsv
    ├── 25760.tsv
    ├── 25761.tsv
    ├── 25762.tsv
    ├── 25763.tsv
    ├── 25764.tsv
    ├── 25765.tsv
    ├── 25766.tsv
    ├── 25774.tsv
    ├── 25776.tsv
    ├── 25777.tsv
    ├── 25778.tsv
    ├── 25779.tsv
    ├── 25780.tsv
    ├── 25781.tsv
    ├── 25782.tsv
    ├── 25873.tsv
    ├── 25880.tsv
    ├── 25881.tsv
    ├── 25882.tsv
    ├── 25883.tsv
    ├── 25884.tsv
    ├── 25886.tsv
    ├── 25984.tsv
    ├── 25985.tsv
    ├── 25989.tsv
    ├── 25992.tsv
    ├── 25993.tsv
    ├── 25994.tsv
    ├── 25995.tsv
    ├── 25996.tsv
    ├── 25998.tsv
    ├── 26000.tsv
    ├── 26001.tsv
    ├── 26003.tsv
    ├── 26008.tsv
    ├── 26009.tsv
    ├── 26010.tsv
    ├── 26011.tsv
    ├── 26012.tsv
    ├── 26013.tsv
    ├── 26014.tsv
    ├── 26017.tsv
    ├── 26019.tsv
    ├── 26026.tsv
    ├── 26030.tsv
    └── 32767.tsv

7 directories, 184 files

And to automatically create a database, I use my PostgreSQL docker image (Dockerfile in the project repository) with pre-installed extensions and a database initialization script, supporting multi-threaded download data:

docker run --name postgis14-thailand --memory=12g --memory-swap=12g --memory-swappiness 0 --shm-size=1g -v /home/acc/dev/map/database/thailand:/var/lib/postgresql/data -v /home/acc/dev/map/thailand/thailand-latest_loc_ways:/input -e POSTGRES_PASSWORD=osmworld -e LD_LIBRARY_PATH=/usr/lib/jvm/java-11-openjdk-amd64/lib/server/ -d -p 5432:5432 -p 5005:5005 5d411c3be57f -c checkpoint_timeout="15 min" -c checkpoint_completion_target=0.9 -c shared_buffers="4096 MB" -c wal_buffers=-1 -c bgwriter_delay=200ms -c bgwriter_lru_maxpages=100 -c bgwriter_lru_multiplier=2.0 -c bgwriter_flush_after=0 -c max_wal_size="32768 MB" -c min_wal_size="16384 MB"
docker logs postgis14-thailand | tail -n 40
initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
2023-01-29 10:22:09.841 UTC [1] LOG:  starting PostgreSQL 14.1 (Debian 14.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-29 10:22:09.841 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-01-29 10:22:09.841 UTC [1] LOG:  listening on IPv6 address "::", port 5432
2023-01-29 10:22:09.847 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-01-29 10:22:09.855 UTC [163] LOG:  database system was shut down at 2023-01-29 10:22:08 UTC
2023-01-29 10:22:09.863 UTC [1] LOG:  database system is ready to accept connections
Time: 16.011 ms
ANALYZE
Time: 49.590 ms
CREATE TABLE
Time: 1.509 ms
COPY 1
Time: 1.110 ms
ANALYZE
Time: 1.167 ms
CREATE TABLE
Time: 1.284 ms
COPY 566
Time: 2.092 ms
ANALYZE
Time: 1.880 ms
CREATE TABLE
Time: 1.210 ms
COPY 17482
Time: 19.023 ms
ANALYZE
Time: 27.044 ms
SELECT 80
Time: 8.333 ms
ANALYZE
Time: 1.533 ms
SELECT 75
Time: 9.650 ms
ANALYZE
Time: 1.213 ms

2023-01-29 10:22:07.219 UTC [49] LOG:  received fast shutdown request
waiting for server to shut down....2023-01-29 10:22:07.222 UTC [49] LOG:  aborting any active transactions
2023-01-29 10:22:07.223 UTC [49] LOG:  background worker "logical replication launcher" (PID 56) exited with exit code 1
2023-01-29 10:22:07.392 UTC [51] LOG:  shutting down
..2023-01-29 10:22:09.631 UTC [49] LOG:  database system is shut down
 done
server stopped

PostgreSQL init process complete; ready for start up.

docker exec -it postgis14-thailand psql -U postgres -d osmworld
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.

osmworld=# \d
                       List of relations
 Schema |          Name          |       Type        |  Owner   
--------+------------------------+-------------------+----------
 public | geography_columns      | view              | postgres
 public | geometry_columns       | view              | postgres
 public | h3_3_bounds_complex    | table             | postgres
 public | multipolygon           | partitioned table | postgres
 public | multipolygon_000       | table             | postgres
 public | multipolygon_001       | table             | postgres
 public | multipolygon_002       | table             | postgres
 public | multipolygon_003       | table             | postgres
 public | multipolygon_004       | table             | postgres
 public | multipolygon_005       | table             | postgres
 public | multipolygon_006       | table             | postgres
 public | multipolygon_007       | table             | postgres
 public | multipolygon_008       | table             | postgres
 public | multipolygon_009       | table             | postgres
 public | multipolygon_010       | table             | postgres
 public | multipolygon_32767     | table             | postgres
 public | nodes                  | partitioned table | postgres
 public | nodes_000              | table             | postgres
 public | nodes_001              | table             | postgres
 public | nodes_002              | table             | postgres
 public | nodes_003              | table             | postgres
 public | nodes_004              | table             | postgres
 public | nodes_005              | table             | postgres
 public | nodes_006              | table             | postgres
 public | nodes_007              | table             | postgres
 public | nodes_008              | table             | postgres
 public | nodes_009              | table             | postgres
 public | nodes_010              | table             | postgres
 public | osm_file_block         | table             | postgres
 public | osm_file_block_content | table             | postgres
 public | osm_file_statistics    | table             | postgres
 public | osm_stat_nodes_3_3     | table             | postgres
 public | osm_stat_ways_3_3      | table             | postgres
 public | relation_members       | table             | postgres
 public | relations              | table             | postgres
 public | spatial_ref_sys        | table             | postgres
 public | ways                   | partitioned table | postgres
 public | ways_000               | table             | postgres
 public | ways_001               | table             | postgres
 public | ways_002               | table             | postgres
 public | ways_003               | table             | postgres
 public | ways_004               | table             | postgres
 public | ways_005               | table             | postgres
 public | ways_006               | table             | postgres
 public | ways_007               | table             | postgres
 public | ways_008               | table             | postgres
 public | ways_009               | table             | postgres
 public | ways_010               | table             | postgres
 public | ways_32767             | table             | postgres
(49 rows)

osmworld=# explain select h3_3, count(*) from ways group by 1;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Gather  (cost=34456.99..97465.64 rows=76 width=10)
   Workers Planned: 4
   ->  Parallel Append  (cost=33456.99..96458.04 rows=19 width=10)
         ->  HashAggregate  (cost=96457.88..96457.95 rows=7 width=10)
               Group Key: ways_4.h3_3
               ->  Seq Scan on ways_004 ways_4  (cost=0.00..90269.92 rows=1237592 width=2)
         ->  HashAggregate  (cost=60359.46..60359.53 rows=7 width=10)
               Group Key: ways_9.h3_3
               ->  Seq Scan on ways_009 ways_9  (cost=0.00..56666.64 rows=738564 width=2)
         ->  HashAggregate  (cost=39018.48..39018.54 rows=6 width=10)
               Group Key: ways_7.h3_3
               ->  Seq Scan on ways_007 ways_7  (cost=0.00..36567.32 rows=490232 width=2)
         ->  HashAggregate  (cost=33456.99..33457.06 rows=7 width=10)
               Group Key: ways_2.h3_3
               ->  Seq Scan on ways_002 ways_2  (cost=0.00..31580.66 rows=375266 width=2)
         ->  HashAggregate  (cost=30029.44..30029.53 rows=9 width=10)
               Group Key: ways_5.h3_3
               ->  Seq Scan on ways_005 ways_5  (cost=0.00..28346.96 rows=336496 width=2)
         ->  HashAggregate  (cost=26775.62..26775.72 rows=10 width=10)
               Group Key: ways.h3_3
               ->  Seq Scan on ways_000 ways  (cost=0.00..25251.75 rows=304775 width=2)
         ->  HashAggregate  (cost=19023.96..19024.03 rows=7 width=10)
               Group Key: ways_1.h3_3
               ->  Seq Scan on ways_001 ways_1  (cost=0.00..17901.97 rows=224397 width=2)
         ->  HashAggregate  (cost=15682.35..15682.43 rows=8 width=10)
               Group Key: ways_6.h3_3
               ->  Seq Scan on ways_006 ways_6  (cost=0.00..14803.23 rows=175823 width=2)
         ->  HashAggregate  (cost=13881.85..13881.90 rows=5 width=10)
               Group Key: ways_3.h3_3
               ->  Seq Scan on ways_003 ways_3  (cost=0.00..13173.90 rows=141590 width=2)
         ->  HashAggregate  (cost=6970.89..6970.92 rows=3 width=10)
               Group Key: ways_8.h3_3
               ->  Seq Scan on ways_008 ways_8  (cost=0.00..6567.26 rows=80726 width=2)
         ->  HashAggregate  (cost=2385.09..2385.10 rows=1 width=10)
               Group Key: ways_11.h3_3
               ->  Seq Scan on ways_32767 ways_11  (cost=0.00..2327.06 rows=11606 width=2)
         ->  HashAggregate  (cost=37.55..37.59 rows=4 width=10)
               Group Key: ways_10.h3_3
               ->  Seq Scan on ways_010 ways_10  (cost=0.00..36.70 rows=170 width=2)
(39 rows)

osmworld=# select h3_3, count(*) from ways group by 1 order by 2 desc limit 20;
 h3_3  | count  
-------+--------
 25764 | 890643
 26010 | 329973
 25994 | 203645
 25730 | 188810
 25995 | 139373
 26011 | 136930
 25780 | 118165
 25765 | 105996
 26009 | 100196
 25762 |  91177
 26014 |  75555
 26003 |  71479
 26008 |  70353
 25883 |  60808
 25634 |  59705
 25777 |  58151
 16749 |  57887
 25782 |  57509
 25638 |  57296
 25880 |  56838
(20 rows)

We connect to the database with any client convenient for the user:

psql -h 127.0.0.1 -p 5432 -U postgres -d osmworld
And admire the statistics of the database of the whole world (587 GB):
         Table          | Rows  |     Total Size     |     Table Size     |  Index(es) Size  |     TOAST Size     
------------------------+-------+--------------------+--------------------+------------------+--------------------
 *** TOTAL ***          | ~1B   | 587 GB (100.00%)   | 501 GB (100.00%)   | 54 GB (100.00%)  | 32 GB (100.00%)
                        |       |                    |                    |                  | 
 ways_051               | ~21M  | 12 GB (2.02%)      | 11 GB (2.14%)      | 1070 MB (1.94%)  | 73 MB (0.22%)
 relation_members       | ~113M | 9523 MB (1.58%)    | 6136 MB (1.20%)    | 3386 MB (6.15%)  | 8192 bytes (0.00%)
 ways_043               | ~14M  | 8102 MB (1.35%)    | 7334 MB (1.43%)    | 737 MB (1.34%)   | 31 MB (0.09%)
 ways_052               | ~14M  | 8013 MB (1.33%)    | 7200 MB (1.40%)    | 729 MB (1.32%)   | 84 MB (0.26%)
 ways_002               | ~17M  | 7923 MB (1.32%)    | 7013 MB (1.37%)    | 870 MB (1.58%)   | 40 MB (0.12%)
 ways_001               | ~15M  | 7360 MB (1.22%)    | 6485 MB (1.26%)    | 788 MB (1.43%)   | 87 MB (0.26%)
 ways_081               | ~14M  | 6963 MB (1.16%)    | 6127 MB (1.19%)    | 716 MB (1.30%)   | 120 MB (0.37%)
 ways_004               | ~13M  | 6886 MB (1.15%)    | 6108 MB (1.19%)    | 691 MB (1.25%)   | 87 MB (0.27%)
 ways_029               | ~11M  | 6824 MB (1.14%)    | 6197 MB (1.21%)    | 567 MB (1.03%)   | 60 MB (0.18%)
 ways_047               | ~11M  | 6750 MB (1.12%)    | 6101 MB (1.19%)    | 577 MB (1.05%)   | 71 MB (0.22%)
 ways_093               | ~13M  | 6522 MB (1.09%)    | 5749 MB (1.12%)    | 644 MB (1.17%)   | 128 MB (0.39%)
 ways_095               | ~13M  | 6451 MB (1.07%)    | 5689 MB (1.11%)    | 671 MB (1.22%)   | 91 MB (0.28%)
 ways_090               | ~12M  | 6405 MB (1.07%)    | 5708 MB (1.11%)    | 620 MB (1.13%)   | 77 MB (0.23%)
 ways_003               | ~12M  | 6401 MB (1.07%)    | 5553 MB (1.08%)    | 641 MB (1.16%)   | 206 MB (0.63%)
 multipolygon_32767     | ~187k | 6366 MB (1.06%)    | 338 MB (0.07%)     | 27 MB (0.05%)    | 6002 MB (18.32%)
 ways_005               | ~12M  | 6317 MB (1.05%)    | 5590 MB (1.09%)    | 601 MB (1.09%)   | 126 MB (0.38%)
 ways_092               | ~12M  | 6227 MB (1.04%)    | 5511 MB (1.07%)    | 610 MB (1.11%)   | 107 MB (0.33%)
 ways_074               | ~11M  | 6208 MB (1.03%)    | 5507 MB (1.07%)    | 587 MB (1.07%)   | 114 MB (0.35%)
 ways_091               | ~11M  | 6158 MB (1.02%)    | 5425 MB (1.06%)    | 578 MB (1.05%)   | 156 MB (0.47%)
 ways_098               | ~12M  | 6154 MB (1.02%)    | 5421 MB (1.06%)    | 595 MB (1.08%)   | 138 MB (0.42%)
 ways_089               | ~11M  | 6128 MB (1.02%)    | 5418 MB (1.06%)    | 576 MB (1.05%)   | 134 MB (0.41%)
 ways_080               | ~11M  | 5973 MB (0.99%)    | 5206 MB (1.01%)    | 549 MB (1.00%)   | 217 MB (0.66%)
 ways_097               | ~11M  | 5940 MB (0.99%)    | 5298 MB (1.03%)    | 547 MB (0.99%)   | 95 MB (0.29%)
 ways_045               | ~11M  | 5877 MB (0.98%)    | 5261 MB (1.03%)    | 546 MB (0.99%)   | 70 MB (0.21%)
 ways_053               | ~9M   | 5810 MB (0.97%)    | 5289 MB (1.03%)    | 487 MB (0.88%)   | 34 MB (0.10%)
 ways_019               | ~10M  | 5794 MB (0.96%)    | 5124 MB (1.00%)    | 517 MB (0.94%)   | 153 MB (0.47%)
 ways_006               | ~10M  | 5774 MB (0.96%)    | 5033 MB (0.98%)    | 510 MB (0.93%)   | 231 MB (0.70%)
 ways_046               | ~10M  | 5702 MB (0.95%)    | 5094 MB (0.99%)    | 505 MB (0.92%)   | 103 MB (0.31%)
 ways_042               | ~10M  | 5683 MB (0.95%)    | 5075 MB (0.99%)    | 496 MB (0.90%)   | 112 MB (0.34%)
 ways_049               | ~9M   | 5663 MB (0.94%)    | 5118 MB (1.00%)    | 456 MB (0.83%)   | 89 MB (0.27%)
 ways_037               | ~9M   | 5634 MB (0.94%)    | 5040 MB (0.98%)    | 447 MB (0.81%)   | 147 MB (0.45%)
 ways_054               | ~9M   | 5627 MB (0.94%)    | 4856 MB (0.95%)    | 485 MB (0.88%)   | 286 MB (0.87%)
 ways_017               | ~10M  | 5605 MB (0.93%)    | 4950 MB (0.97%)    | 507 MB (0.92%)   | 148 MB (0.45%)
 ways_075               | ~9M   | 5587 MB (0.93%)    | 4940 MB (0.96%)    | 488 MB (0.89%)   | 159 MB (0.49%)
 ways_026               | ~9M   | 5585 MB (0.93%)    | 5064 MB (0.99%)    | 482 MB (0.88%)   | 39 MB (0.12%)
 ways_048               | ~9M   | 5584 MB (0.93%)    | 4985 MB (0.97%)    | 484 MB (0.88%)   | 114 MB (0.35%)
 ways_082               | ~10M  | 5548 MB (0.92%)    | 4890 MB (0.95%)    | 500 MB (0.91%)   | 158 MB (0.48%)
 ways_000               | ~9M   | 5460 MB (0.91%)    | 4750 MB (0.93%)    | 461 MB (0.84%)   | 248 MB (0.76%)
 ways_094               | ~10M  | 5457 MB (0.91%)    | 4737 MB (0.92%)    | 493 MB (0.89%)   | 227 MB (0.69%)
 ways_033               | ~8M   | 5454 MB (0.91%)    | 4808 MB (0.94%)    | 403 MB (0.73%)   | 243 MB (0.74%)
 ways_087               | ~9M   | 5445 MB (0.91%)    | 4808 MB (0.94%)    | 463 MB (0.84%)   | 173 MB (0.53%)
 ways_010               | ~9M   | 5357 MB (0.89%)    | 4774 MB (0.93%)    | 440 MB (0.80%)   | 143 MB (0.44%)
 ways_040               | ~9M   | 5348 MB (0.89%)    | 4722 MB (0.92%)    | 467 MB (0.85%)   | 159 MB (0.48%)
 ways_088               | ~9M   | 5262 MB (0.88%)    | 4662 MB (0.91%)    | 455 MB (0.83%)   | 145 MB (0.44%)
 ways_060               | ~8M   | 5243 MB (0.87%)    | 4673 MB (0.91%)    | 420 MB (0.76%)   | 150 MB (0.46%)
 ways_096               | ~10M  | 5207 MB (0.87%)    | 4613 MB (0.90%)    | 495 MB (0.90%)   | 99 MB (0.30%)
 ways_024               | ~8M   | 5205 MB (0.87%)    | 4722 MB (0.92%)    | 431 MB (0.78%)   | 52 MB (0.16%)
 ways_071               | ~9M   | 5198 MB (0.87%)    | 4625 MB (0.90%)    | 469 MB (0.85%)   | 104 MB (0.32%)
:

Let’s use the example from my previous posts and count the number of school buildings around the world, broken down by the H3 index:

docker start postgis14-planet-220704

psql -h 127.0.0.1 -p 5432 -U postgres -d osmworld

Password for user postgres: 
🧐 🐘 postgres_dba 6.0 installed. Use ":dba" to see menu
Timing is on.
psql (14.6 (Ubuntu 14.6-1.pgdg20.04+1), server 14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.

osmworld=# \timing on
Timing is on.
osmworld=# create table school 
    as select h3_3, count(*) as "count" from ways 
        where closed and (tags->'building' = 'school' 
         or (tags->'building' is not null and tags->'amenity'='school')) 
       group by h3_3 order by 2 desc;
SELECT 6783
Time: 215154,128 ms (03:35,154)

Let’s see the query plan, what took these 03:35 minutes from the database, how our query was executed in parallel on all partitions:

explain create table school 
    as select h3_3, count(*) as "count" from ways 
        where closed and (tags->'building' = 'school' 
         or (tags->'building' is not null and tags->'amenity'='school')) 
       group by h3_3 order by 2 desc

QUERY PLAN                                                                                              
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Merge  (cost=13217952.21..13219300.04 rows=11052 width=10)
   Workers Planned: 6
   ->  Sort  (cost=13216952.12..13216956.72 rows=1842 width=10)
         Sort Key: (count(*)) DESC
         ->  Parallel Append  (cost=1107174.96..13216852.21 rows=1842 width=10)
               ->  GroupAggregate  (cost=1889369.47..1890603.83 rows=7 width=10)
                     Group Key: ways_51.h3_3
                     ->  Sort  (cost=1889369.47..1889780.90 rows=164572 width=2)
                           Sort Key: ways_51.h3_3
                           ->  Seq Scan on ways_051 ways_51  (cost=0.00..1873744.28 rows=164572 width=2)
                                 Filter: (closed AND (((tags -> 'building'::text) = 'school'::text) OR (((tags -> 'building'::text) IS NOT NULL) AND ((tags -> 'amenity'::text) = 'school'::text))))
               ->  HashAggregate  (cost=1278539.18..1278539.58 rows=40 width=10)
                     Group Key: ways_2.h3_3
                     ->  Seq Scan on ways_002 ways_2  (cost=0.00..1277795.15 rows=148806 width=2)
                           Filter: (closed AND (((tags -> 'building'::text) = 'school'::text) OR (((tags -> 'building'::text) IS NOT NULL) AND ((tags -> 'amenity'::text) = 'school'::text))))
               ->  GroupAggregate  (cost=1269524.73..1270332.33 rows=7 width=10)
                     Group Key: ways_43.h3_3
                     ->  Sort  (cost=1269524.73..1269793.90 rows=107671 width=2)
                           Sort Key: ways_43.h3_3
                           ->  Seq Scan on ways_043 ways_43  (cost=0.00..1260525.44 rows=107671 width=2)
                                 Filter: (closed AND (((tags -> 'building'::text) = 'school'::text) OR (((tags -> 'building'::text) IS NOT NULL) AND ((tags -> 'amenity'::text) = 'school'::text))))
               ->  GroupAggregate  (cost=1248285.94..1249040.42 rows=7 width=10)
                     Group Key: ways_52.h3_3
                     ->  Sort  (cost=1248285.94..1248537.41 rows=100588 width=2)
                           Sort Key: ways_52.h3_3
                           ->  Seq Scan on ways_052 ways_52  (cost=0.00..1239928.03 rows=100588 width=2)
                                 Filter: (closed AND (((tags -> 'building'::text) = 'school'::text) OR (((tags -> 'building'::text) IS NOT NULL) AND ((tags -> 'amenity'::text) = 'school'::text))))
 ...

 JIT:
   Functions: 806
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(612 rows)

And visualize the result in QGIS:

select 
   h3_to_geo_boundary_geometry(h3_3::h3index), 
   count 
  from school 
      where count>100

PostgreSQL is supported by most data visualization software, complex queries with window functions, subqueries also work. Since postgres can be run on almost any hardware, geoanalytics is available everywhere, only with performance issues.

Welcome to the world of geodata analytics!

The uniqueness of the OpenStreetMap project is the ability of everyone on the planet to use the data of the project, but given the amount of data, it was not so easy. Now you don’t need a powerful specialized server to run the database, and I hope that my approach to OSM data partitioning and the ability to use PostgreSQL for geospatial data analytics will allow more people to make queries to this unique project and develop new services that will take into account distances, real-world objects. world and improve the world around us.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *