Benchmark Prometheus vs VictoriaMetrics on node_exporter metrics

For prospective students course “DevOps Practices and Tools” prepared a translation of interesting material.

We also invite everyone to participate in open webinar on “Prometheus: Quick Start”. During the lesson, participants will consider the Prometheus architecture and how it works with metrics, as well as how to generate alerts and events in the system.


VictoriaMetrics recently added Scraping of Prometheus targets… And now we can compare apples to apples: how many resources are used by Prometheus and VictoriaMetrics when scraping a large amount node_exporter

Benchmark setup

Testing was carried out in Google Compute Engine on four machines (instances):

  • Instance with node exporter v1.0.1 for scraping… The machine e2-standard-4 with the following configuration: 4 vCPUs, 16 GB RAM, 1 TB HDD. Initial tests showed that node_exporter cannot process more than a few hundred requests per second. Prometheus and VictoriaMetrics put too much stress on node_exporter… So it was decided before node_exporter host nginx with one second caching responses. This allowed to reduce the load on node_exporter to reasonable values ​​so that it can process all incoming requests without scraping errors.

  • Two separate instances e2-highmem-4 for Prometheus v2.22.2 and VictoriaMetrics v1.47.0 with the following configuration: 4 vCPUs, 32 GB RAM, 1 TB. Both VictoriaMetrics and Prometheus started with default settings, except for the path to the scraping configuration file (-promscrape.config = prometheus.yml for VictoriaMetrics and -config.file = prometheus.yml for Prometheus). File prometheus.yml was generated with the following template Jinja2:

global:
  scrape_interval: 10s
scrape_configs:
  - job_name: node_exporter
    static_configs:
{% for n in range(3400) %}
      - targets: ['host-node-{{n}}:9100']
        labels:
          host_number: cfg_{{n}}
          role: node-exporter
          env: prod
{% endfor %}

Everything host-node-{{n}} pointed to a car with node_exporter… This was done through /etc/hosts… Thus we have emulated scraping 3400 node_exporter

  • E2-standard-2 machine for VictoriaMetrics and Prometheus monitoring… The VictoriaMetrics instance on this machine was configured to receive application specific metrics as well as metrics node_exporter from machines with VictoriaMetrics and Prometheus. The graphs below were constructed based on these metrics.

We chose node_exporter the following reasons:

  • node_exporter Is the most common exporter used by most Prometheus installations.

  • node_exporter exports real metrics under load (cpu, memory, disk I / O, network, etc.) so test results can be extrapolated to production installations of Prometheus.

Both VictoriaMetrics and Prometheus scraped the same nodeexporter and were launched at the same time. The benchmark lasted 24 hours.

Storage statistics

For VictoriaMetrics and Prometheus, the statistics are the same:

  • Ingestion rate: 280 thousand measurements / sec.

  • Active time series: 2.8 million

  • Samples scraped and stored: 24.5 billion

Test results

Disk space usage:

Disk space usage.
Disk space usage.

Prometheus generates 15GB spikes in disk space at regular intervals, while VictoriaMetrics generates infrequent and much smaller spikes. The maximum spike for VictoriaMetrics is 4GB. Let’s take a look at the summary statistics of disk space usage:

  • VictoriaMetrics: 7.2 GB This corresponds to 0.3 bytes per dimension (7.2 GB / 24.5 billion dimensions).

  • Prometheus: 52.3 GB (32.3 GB data + 18 GB WAL) This equates to 52.3 GB / 24.5 billion measurements = 2.1 bytes per measurement. It turns out that Prometheus requires 7 times (2.1 / 0.3) more space to store the same amount of data.

Using disk I / O:

Disk I / O: Bytes written per second.
Disk I / O: Bytes written per second.
Disk I / O: Bytes read per second.
Disk I / O: Bytes read per second.

In terms of disk reads, Prometheus generates bursts of up to 95 MB / s at regular intervals, while VictoriaMetrics’ maximum burst is 15 MB / s.

CPU Usage:

CPU usage, vCPU cores.
CPU usage, vCPU cores.
  • Scripting 3400 node_exporter 1.5-1.75 vCPU cores are used. This means that a system with four vCPUs has enough power to scrip an additional 4000 node_exporter

  • The peaks on the processor in both cases are associated with background data compression. These spikes are generally harmless to VictoriaMetrics, but at the same time can lead to OOM (out of memory) for Prometheus. For more information on background compression (also called merge) in VictoriaMetrics see this article

Memory usage:

Using RSS memory.
Using RSS memory.

VictoriaMetrics consistently uses 4.3 GB Rss memorywhile Prometheus starts at 6.5GB and stabilizes at 14GB with spikes up to 23GB. These spikes in memory usage often lead to OOM crashes and data loss if there is not enough memory on the machine or there are memory constraints for Kubernetes pods with Prometheus. Fortunately, the test machine had 32GB of RAM, so no crashes were observed. If you want to know more about why Prometheus can lose data after an abnormal shutdown, for example, due to OOM, then read this article

According to the above graph, Prometheus requires 5.3x (23GB / 4.3GB) more RAM than VictoriaMetrics

findings

Both Prometheus and VictoriaMetrics can collect millions of metrics from thousands of targets on a machine with a pair of vCPU cores. In accordance with these benchmarks this is a much better result than InfluxDB and TimescaleDB.

VictoriaMetrics requires up to 5x less RAM and up to 7x less disk space than Prometheus when scrying thousands node_exporter… This translates into significant infrastructure savings.

PS If you haven’t used VictoriaMetrics yet, it’s time to give it a try. VictoriaMetrics free and open source (including cluster version). If you are looking for enterprise capabilities and support, visit site.


Learn more about the course DevOps Practices and Tools.

Watch an open webinar on the topic “Prometheus: Quick Start”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *