Introducing PromQL + Cheatsheet

Download Cheatsheet on PromQL Queries

Getting started with PromQL can be challenging if you’re just starting your journey into the fascinating world of Prometheus. This guide will help you understand how it works, and this article includes interesting and helpful tips to get you started.

Because Prometheus stores data as a time-series data model, PromQL queries are radically different from conventional SQL. Understanding how to work with data in Prometheus is key to learning how to write efficient queries.

Do not forget download Cheatsheet for PromQL Requests!

How time-series databases work

Time series Are streams of values ​​associated with a timestamp.

Each time series can be identified by the metric name and labels, for example:

mongodb_up{}

or

kube_node_labels{cluster="aws-01", label_kubernetes_io_role="master"}

In the above example, the metric name is present (kube_node_labels) and labels (cluster and label_kubernetes_io_role). In fact, metrics are also labels. The above query can be written like this:

{__name__ = "kube_node_labels", cluster="aws-01", label_kubernetes_io_role="master"}

There are four types of metrics in Prometheus:

  • Gauges (Meter) – values ​​that can change. For example, the metric mongodb_up lets you know if the exporter has a connection to a MongoDB instance.

  • Counters (Counter) show cumulative values ​​and usually have the suffix _total… For example, http_requests_total

  • Histogram (Histogram) is a combination of different counters used to keep track of dimensions and their duration, such as the duration of requests.

  • Summary (Summary) works like a histogram but also calculates quantiles.

Introducing PromQL Data Fetching

Selecting data in PromQL is as easy as specifying the metric you want to get data from. In this example, we will use the metric http_requests_total

Let’s say we want to know the number of requests on the / api path on the host 10.2.0.4. For this we will use labels host and path from this metric:

http_requests_total{host="10.2.0.4", path="/api"}

The request will return the following values:

name

host

path

status_code

value

http_requests_total

10.2.0.4

/api

200

98

http_requests_total

10.2.0.4

/api

503

20

http_requests_total

10.2.0.4

/api

401

1

Each row in this table represents a stream with the last available value. Insofar as http_requests_total contains a certain number of requests made since the last restart of the counter, we see 98 successful requests.

It is called instant vector, the earliest value for each thread at the time specified in the request. Since samples are taken at random times, Prometheus rounds the results. If no duration is specified, then the last available value is returned.

Alternatively, you can get instant vector from another period of time (for example, a day ago).

For this you need to add offset (offset), for example:

http_requests_total{host="10.2.0.4", path="/api", status_code="200"} offset 1d

To get the metric value within the specified time interval, you must specify it in brackets:

http_requests_total{host="10.2.0.4", path="/api"}[10m]

The request will return the following values:

name

host

path

status_code

value

http_requests_total

10.2.0.4

/api

200

641309@1614690905.515

641314@1614690965.515

641319@1614691025.502

http_requests_total

10.2.0.5

/api

200

641319@1614690936.628

641324@1614690996.628

641329@1614691056.628

http_requests_total

10.2.0.2

/api

401

368736@1614690901.371

368737@1614690961.372

368738@1614691021.372

The query returns multiple values ​​for each time series because we requested data for a specific period of time, and each value is associated with a timestamp.

It is called range vector – all values ​​for each series within the specified time interval.

Introduction to Aggregators and PromQL Operators

As you can see, PromQL selectors help you get metrics data. But what if you want more complex results?

Let’s imagine that we have a metric node_cpu_cores with label cluster… We could, for example, summarize the results by concatenating them by a specific label:

sum by (cluster) (node_cpu_cores)

The request will return the following values:

cluster

value

foo

100

bar

fifty

With this simple query, we see that there is 100 CPU cores for a cluster cluster_foo and 50 for cluster_bar

In addition, we can use arithmetic operators in PromQL queries. For example, using the metric node_memory_MemFree_byteswhich returns the amount of free memory in bytes, we could get this value in megabytes using the division operator:

node_memory_MemFree_bytes / (1024 * 1024)

We can also get the percentage of free memory available by comparing the previous metric with node_memory_MemTotal_byteswhich returns the total amount of memory available on the node:

(node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100

We can now use this query to generate an alert when there is less than 5% free memory left on the node:

(node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100 < 5

Introducing PromQL Functions

PromQL supports a large number of functions that we can use to get more complex results. For example, in the previous example, we could use the function topkto determine which of the two nodes has the most free memory (percentage):

topk(2, (node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100)

Prometheus allows you not only to get information about past events, but even to make forecasts. Function pred_linear predicts where the time series will be in a given period of time.

Imagine you want to know how much free disk space will be available in the next 24 hours. You can apply the function pred_linear to the results for the last week from the metric node_filesystem_free_byteswhich returns the available free disk space. This allows you to predict the amount of free disk space in gigabytes in the next 24 hours:

predict_linear(node_filesystem_free_bytes[1w], 3600 * 24) / (1024 * 1024 * 1024) < 100

When working with Prometheus meters, it is convenient to use the function rate… It calculates the average rate of increase of the time series in a range vector per second, and the counter resets are automatically adjusted. In addition, the calculation is extrapolated to the ends of the time range.

What if we need to create an alert that is triggered if we have not received a request for 10 minutes. We can’t just use a metric http_requests_totalbecause if you reset the counter within the specified time range, the results would be inaccurate:

http_requests_total[10m]

name

host

path

status_code

value

http_requests_total

10.2.0.4

/api

200

100@1614690905.515

300@1614690965.515

50@1614691025.502

In the example above, after resetting the counter, we get negative values ​​between 300 and 50, so this metric alone is not enough for us. We can solve the problem with the function rate… Since it counts counter resets, the results are recorded as if they were:

name

host

path

status_code

value

http_requests_total

10.2.0.4

/api

200

100@1614690905.515

300@1614690965.515

350@1614691025.502

rate(http_requests_total[10m])

name

host

path

status_code

value

http_requests_total

10.2.0.4

/api

200

0.83

Regardless of resets, there were an average of 0.83 requests per second in the last 10 minutes. Now we can set up the alert:

rate(http_requests_total[10m]) = 0

What’s next?

In this article, we learned how Prometheus stores data, looked at examples of PromQL queries for fetching and aggregating data.

You can download See the PromQL Cheatsheet to learn more about PromQL operators and functions. You can also check all examples from the article and Cheatsheet with our service Prometheus playground

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *