In a regular “string” DBMS, examples of which are MySQL, Postgres, MS SQL Server, the data is stored in this order:
At the same time, the values related to one row are physically stored side by side. In a columnar DBMS, values from different columns are stored separately, and the data of one column is stored together:
Examples of columnar DBMSs are Vertica, Paraccel (Actian Matrix, Amazon Redshift), Sybase IQ, Exasol, Infobright, InfiniDB, MonetDB (VectorWise, Actian Vector), LucidDB, SAP HANA, Google Dremel, Google PowerDrill, Druid, kdb +.
Mail-forwarding company Qwintry began using Clickhouse in 2018 for reporting and was very impressed with its simplicity, scalability, SQL support and speed. The speed of this DBMS was bordered by magic.
Clickhouse installs on Ubuntu with one single command. If you know SQL, you can immediately start using Clickhouse for your needs. However, this does not mean that you can run "show create table" in MySQL and copy-paste SQL in Clickhouse.
Compared with MySQL, in this DBMS there are important differences of data types in the definitions of table schemas, so for comfortable work you still need some time to change the definitions of the table schema and study the table engines.
Clickhouse works great without any additional software, but if you want to use replication, you will need to install ZooKeeper. Query performance analysis shows excellent results – system tables contain all the information, and all the data can be obtained using the old and boring SQL.
- Benchmark comparing Clickhouse with Vertica and MySQL on a configuration server: two Intel® Xeon® CPU sockets E5-2650 v2 @ 2.60GHz; 128 GiB RAM; md RAID-5 on 8 6TB SATA HDD, ext4.
- Benchmark comparing Clickhouse with Amazon RedShift cloud data storage.
- Excerpts from the Cloudflare Clickhouse Performance Blog:
The ClickHouse database has a very simple design – all nodes in the cluster have the same functionality and use only ZooKeeper for coordination. We built a small cluster of several nodes and performed testing, during which we found that the system has quite impressive performance, which corresponds to the declared advantages in the benchmarks of analytical DBMSs. We decided to take a closer look at the concept behind ClickHouse. The first obstacle to research was the lack of tools and the small size of the ClickHouse community, so we delved into the design of this database management system to understand how it works.
ClickHouse does not support receiving data directly from Kafka, since it is just a database, so we wrote our own adapter service in Go. He read Kafka’s Cap’n Proto encoded messages, converted them to TSV, and inserted them into ClickHouse in batches via the HTTP interface. Later, we rewrote this service to use the Go library in conjunction with our own ClickHouse interface to improve performance. When evaluating the performance of receiving packets, we found out an important thing – it turned out that at ClickHouse this performance strongly depends on the size of the packet, that is, the number of simultaneously inserted rows. To understand why this is happening, we examined how ClickHouse stores data.
The main engine, or rather, the family of table engines used by ClickHouse to store data, is MergeTree. This engine is conceptually similar to the LSM algorithm used by Google BigTable or Apache Cassandra, but avoids building an intermediate memory table and writes data directly to disk. This gives it excellent write throughput, since each inserted packet is sorted only by the “primary key” primary key, is compressed and written to disk to form a segment.
The absence of a memory table or any concept of “freshness” of data also means that they can only be added; the system does not support changing or deleting them. Today, the only way to delete data is to delete it by calendar months, since segments never cross the month boundary. The ClickHouse team is actively working to make this feature customizable. On the other hand, this makes segment recording and merging seamless, so reception bandwidth scales linearly with the number of parallel inserts until I / O or cores are saturated.
However, this circumstance also means that the system is not suitable for small packages, therefore, Kafka services and inserters are used for buffering. Further, ClickHouse in the background continues to constantly perform the merging of segments, so that many small pieces of information will be combined and recorded more times, thereby increasing the recording intensity. In this case, too many unrelated parts will cause aggressive throttling of the inserts as long as the merger continues. We found that the best compromise between real-time data reception and reception performance is to receive a limited number of inserts per second into the table.
The key to table reading performance is indexing and positioning data on disk. Regardless of how fast the processing is, when the engine needs to scan terabytes of data from the disk and use only part of them, it will take time. ClickHouse is a column store, so each segment contains a file for each column (column) with sorted values for each row. Thus, entire columns that are not in the query can be skipped first, and then several cells can be processed in parallel with the vectorized execution. To avoid a full scan, each segment has a small index file.
Given that all columns are sorted by “primary key”, the index file contains only the labels (captured rows) of each Nth row in order to be able to store them in memory even for very large tables. For example, you can set the default settings "mark every 8192th row", then the "meager" indexing of the table with 1 trillion. lines, which fits easily into memory, will occupy only 122,070 characters.
The development and improvement of Clickhouse can be traced to the Github repo and make sure that the process of "growing up" is happening at an impressive pace.
Clickhouse seems to be growing exponentially, especially in the Russian-speaking community. Last year's conference High load 2018 (Moscow, November 8–9, 2018) showed that monsters such as vk.com and Badoo use Clickhouse, with which they paste data (for example, logs) from tens of thousands of servers at the same time. In a 40-minute video, Yuri Nasretdinov from the VKontakte team talks about how this is done. Soon we will post the transcript on Habr for the convenience of working with the material.
Areas of use
After I spent some time researching, I think that there are areas in which ClickHouse can be useful or able to completely replace other, more traditional and popular solutions, such as MySQL, PostgreSQL, ELK, Google Big Query, Amazon RedShift, TimescaleDB, Hadoop, MapReduce, Pinot, and Druid. The following are details of using ClickHouse to upgrade or completely replace the above DBMSs.
Extending MySQL and PostgreSQL
Most recently, we partially replaced MySQL with ClickHouse for the Mautic newsletter newsletter platform. The problem was that MySQL, because of its ill-conceived design, logged every sent letter and every link in this letter with a base64 hash, creating a huge MySQL table (email_stats). After sending only 10 million letters to service subscribers, this table took up 150 GB of file space, and MySQL began to "dull" on simple queries. To fix the file space problem, we successfully used InnoDB table compression, which reduced it by 4 times. However, it still makes no sense to store more than 20-30 million emails in MySQL just for the sake of reading the story, since any simple query that for some reason needs to perform a full scan leads to a swap and a large load on I / O, for about which we regularly received Zabbix warnings.
Clickhouse uses two compression algorithms that reduce the amount of data by about 3-4 times, but in this particular case the data was especially “compressible”.
Based on our own experience, the ELK stack (ElasticSearch, Logstash and Kibana, in this particular case, ElasticSearch) requires much more resources to run than is necessary for storing logs. ElasticSearch is a great engine if you need a good full-text search in the logs (and I don’t think you really need it), but I wonder why, de facto, it has become the standard logging engine. Its reception performance in combination with Logstash created problems for us even with rather small workloads and required the addition of more and more RAM and disk space. As a database, Clickhouse is better than ElasticSearch for the following reasons:
- SQL dialect support;
- The best compression ratio of stored data;
- Support for regex regular expression searches instead of full-text searches;
- Improved query planning and higher overall performance.
Currently, the biggest problem that arises when comparing ClickHouse with ELK is the lack of solutions for shipping logs, as well as the lack of documentation and training manuals on this topic. At the same time, each user can configure ELK using the Digital Ocean manual, which is very important for the rapid implementation of such technologies. There is a database engine here, but there is no Filebeat for ClickHouse yet. Yes, there is fluentd and a system for working with loghouse logs, there is a clicktail tool for entering data of log files into ClickHouse, but all this takes more time. However, ClickHouse is still a leader by virtue of its prostate, so even beginners install it in an elementary way and start fully functional use in just 10 minutes.
Preferring minimalist solutions, I tried using FluentBit, a tool for shipping logs with a very small amount of memory, along with ClickHouse, while trying to avoid using Kafka. However, minor incompatibilities must be fixed, such as date format problems, before this can be done without a proxy layer that converts data from FluentBit to ClickHouse.
As an alternative to Kibana, you can use Grafana as the ClickHouse backend. As I understand it, this may cause performance problems when rendering a huge amount of data points, especially with older versions of Grafana. At Qwintry, we have not tried this yet, but complaints about this from time to time appear on the ClickHouse support channel in Telegram.
Replacing Google Big Query and Amazon RedShift (solution for large companies)
The ideal use case for BigQuery is to download 1 TB of JSON data and perform analytic queries on them. Big Query is a great product whose scalability is hard to overestimate. This is much more complex software than ClickHouse, which runs on an internal cluster, but from the point of view of the client, it has much in common with ClickHouse. BigQuery can quickly go up in price as soon as you pay for each SELECT, so this is a real SaaS solution with all its pros and cons.
ClickHouse is the best choice when you are doing a lot of computationally expensive queries. The more SELECT queries you execute each day, the more it makes sense to replace Big Query with ClickHouse, because such a replacement will save you thousands of dollars when it comes to many terabytes of processed data. This does not apply to stored data, which is quite cheap to process in Big Query.
The article by Altinity co-founder Alexander Zaitsev, “Switching to ClickHouse,” talks about the benefits of such a DBMS migration.
TimescaleDB is a PostgreSQL extension that optimizes the work with time series timeseries in a regular database (https://docs.timescale.com/v1.0/introduction, https://habr.com/en/company/zabbix/blog/458530 /).
Although ClickHouse is not a serious competitor in the time series niche, but the column structure and vector execution of queries, in most cases of processing analytical queries it is much faster than TimescaleDB. At the same time, the performance of receiving ClickHouse packet data is about 3 times higher, in addition, it uses 20 times less disk space, which is really important for processing large amounts of historical data: https://www.altinity.com/blog/ClickHouse-for -time-series.
Unlike ClickHouse, the only way to save some disk space in TimescaleDB is to use ZFS or similar file systems.
Upcoming ClickHouse updates are likely to introduce delta compression, which will make it even more suitable for processing and storing time series data. TimescaleDB may be a better choice than the naked ClickHouse in the following cases:
- small installations with a very small amount of RAM (<3 GB);
- a large number of small INSERTs that you do not want to buffer into large fragments;
- better consistency, uniformity and ACID requirements;
- PostGIS support;
- merging with existing PostgreSQL tables, since Timescale DB is essentially PostgreSQL.
Competition with Hadoop and MapReduce
Hadoop and other MapReduce products can perform many complex calculations, but they usually work with huge delays. ClickHouse fixes this problem by processing terabytes of data and delivering results almost instantly. Thus, ClickHouse is much more efficient for performing fast, interactive analytical research, which should be interesting for data processing specialists.
Competition with Pinot and Druid
ClickHouse's closest competitors are Pinot and Druid, a columnar linearly scalable open source product. Excellent work in comparing these systems was published in an article by Roman Leventov dated February 1, 2018.
This article requires updating – it states that ClickHouse does not support UPDATE and DELETE operations, which is not entirely true with respect to recent versions.
We don’t have enough experience with these DBMSs, but I don’t like the complexity of the infrastructure used to run Druid and Pinot – this is a whole bunch of “moving parts” surrounded by Java from all sides.
Druid and Pinot are Apache incubator projects, the development progress of which is covered in detail by Apache on the pages of its GitHub projects. Pinot appeared in the incubator in October 2018, and Druid was born 8 months earlier – in February.
The lack of information about how AFS works gives me some, and perhaps silly questions. I wonder if the authors of Pinot noticed that the Apache Foundation is more inclined towards Druid, and did this attitude towards the competitor cause envy? Will the development of Druid slow down and Pinot will accelerate if sponsors supporting the former suddenly become interested in the latter?
Disadvantages of ClickHouse
Immaturity: Obviously, this is still not a boring technology, but in any case, nothing similar is observed in other columnar DBMSs.
Small inserts do not work well at high speed: the inserts must be divided into large pieces, because the performance of small inserts decreases in proportion to the number of columns in each row. This is how ClickHouse stores data on disk — each column means 1 file or more, so to insert 1 row containing 100 columns, you must open and write at least 100 files. This is why an intermediary is required to buffer inserts (unless the client itself provides buffering) – usually this is Kafka or some kind of queue management system. You can also use the Buffer table engine to later copy large chunks of data into MergeTree tables.
Table joins are limited by server RAM, but at least they are there! For example, Druid and Pinot do not have such connections at all, since they are difficult to implement directly in distributed systems that do not support the movement of large pieces of data between nodes.
In the coming years, we plan to make extensive use of ClickHouse in Qwintry, as this DBMS provides an excellent balance of performance, low overhead, scalability, and simplicity. I am pretty sure that it will begin to spread rapidly as soon as the ClickHouse community comes up with more ways to use it on small and medium installations.
A bit of advertising 🙂
Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to your friends, cloud-based VPS for developers from $ 4.99, A unique analogue of entry-level servers that was invented by us for you: The whole truth about VPS (KVM) E5-2697 v3 (6 Cores) 10GB DDR4 480GB SSD 1Gbps from $ 19 or how to divide the server? (options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).
Dell R730xd 2 times cheaper at the Equinix Tier IV data center in Amsterdam? Only here 2 x Intel TetraDeca-Core Xeon 2x E5-2697v3 2.6GHz 14C 64GB DDR4 4x960GB SSD 1Gbps 100 TV from $ 199 in the Netherlands! Dell R420 – 2x E5-2430 2.2Ghz 6C 128GB DDR3 2x960GB SSD 1Gbps 100TB – from $ 99! Читайте о том Как построить инфраструктуру корп. класса c применением серверов Dell R730xd Е5-2650 v4 стоимостью 9000 евро за копейки?