Introducing PromQL + Cheatsheet
Download Cheatsheet on PromQL Queries
Getting started with PromQL can be challenging if you’re just starting your journey into the fascinating world of Prometheus. This guide will help you understand how it works, and this article includes interesting and helpful tips to get you started.
Because Prometheus stores data as a time-series data model, PromQL queries are radically different from conventional SQL. Understanding how to work with data in Prometheus is key to learning how to write efficient queries.
Do not forget download Cheatsheet for PromQL Requests!
How time-series databases work
Time series Are streams of values associated with a timestamp.
Each time series can be identified by the metric name and labels, for example:
mongodb_up{}
or
kube_node_labels{cluster="aws-01", label_kubernetes_io_role="master"}
In the above example, the metric name is present (kube_node_labels
) and labels (cluster
and label_kubernetes_io_role
). In fact, metrics are also labels. The above query can be written like this:
{__name__ = "kube_node_labels", cluster="aws-01", label_kubernetes_io_role="master"}
There are four types of metrics in Prometheus:
Gauges (Meter) – values that can change. For example, the metric
mongodb_up
lets you know if the exporter has a connection to a MongoDB instance.Counters (Counter) show cumulative values and usually have the suffix
_total
… For example,http_requests_total
…Histogram (Histogram) is a combination of different counters used to keep track of dimensions and their duration, such as the duration of requests.
Summary (Summary) works like a histogram but also calculates quantiles.
Introducing PromQL Data Fetching
Selecting data in PromQL is as easy as specifying the metric you want to get data from. In this example, we will use the metric http_requests_total
…
Let’s say we want to know the number of requests on the / api path on the host 10.2.0.4. For this we will use labels host
and path
from this metric:
http_requests_total{host="10.2.0.4", path="/api"}
The request will return the following values:
name | host | path | status_code | value |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Each row in this table represents a stream with the last available value. Insofar as http_requests_total
contains a certain number of requests made since the last restart of the counter, we see 98 successful requests.
It is called instant vector, the earliest value for each thread at the time specified in the request. Since samples are taken at random times, Prometheus rounds the results. If no duration is specified, then the last available value is returned.
Alternatively, you can get instant vector from another period of time (for example, a day ago).
For this you need to add offset
(offset), for example:
http_requests_total{host="10.2.0.4", path="/api", status_code="200"} offset 1d
To get the metric value within the specified time interval, you must specify it in brackets:
http_requests_total{host="10.2.0.4", path="/api"}[10m]
The request will return the following values:
name | host | path | status_code | value |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The query returns multiple values for each time series because we requested data for a specific period of time, and each value is associated with a timestamp.
It is called range vector – all values for each series within the specified time interval.
Introduction to Aggregators and PromQL Operators
As you can see, PromQL selectors help you get metrics data. But what if you want more complex results?
Let’s imagine that we have a metric node_cpu_cores
with label cluster
… We could, for example, summarize the results by concatenating them by a specific label:
sum by (cluster) (node_cpu_cores)
The request will return the following values:
cluster | value |
foo | 100 |
bar | fifty |
With this simple query, we see that there is 100
CPU cores for a cluster cluster_foo
and 50
for cluster_bar
…
In addition, we can use arithmetic operators in PromQL queries. For example, using the metric node_memory_MemFree_bytes
which returns the amount of free memory in bytes, we could get this value in megabytes using the division operator:
node_memory_MemFree_bytes / (1024 * 1024)
We can also get the percentage of free memory available by comparing the previous metric with node_memory_MemTotal_bytes
which returns the total amount of memory available on the node:
(node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100
We can now use this query to generate an alert when there is less than 5% free memory left on the node:
(node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100 < 5
Introducing PromQL Functions
PromQL supports a large number of functions that we can use to get more complex results. For example, in the previous example, we could use the function topk
to determine which of the two nodes has the most free memory (percentage):
topk(2, (node_memory_MemFree_bytes / node_memory_MemTotal_bytes) * 100)
Prometheus allows you not only to get information about past events, but even to make forecasts. Function pred_linear
predicts where the time series will be in a given period of time.
Imagine you want to know how much free disk space will be available in the next 24 hours. You can apply the function pred_linear
to the results for the last week from the metric node_filesystem_free_bytes
which returns the available free disk space. This allows you to predict the amount of free disk space in gigabytes in the next 24 hours:
predict_linear(node_filesystem_free_bytes[1w], 3600 * 24) / (1024 * 1024 * 1024) < 100
When working with Prometheus meters, it is convenient to use the function rate
… It calculates the average rate of increase of the time series in a range vector per second, and the counter resets are automatically adjusted. In addition, the calculation is extrapolated to the ends of the time range.
What if we need to create an alert that is triggered if we have not received a request for 10 minutes. We can’t just use a metric http_requests_total
because if you reset the counter within the specified time range, the results would be inaccurate:
http_requests_total[10m]
name | host | path | status_code | value |
|
|
|
|
|
In the example above, after resetting the counter, we get negative values between 300 and 50, so this metric alone is not enough for us. We can solve the problem with the function rate
… Since it counts counter resets, the results are recorded as if they were:
name | host | path | status_code | value |
|
|
|
|
|
rate(http_requests_total[10m])
name | host | path | status_code | value |
|
|
|
|
|
Regardless of resets, there were an average of 0.83 requests per second in the last 10 minutes. Now we can set up the alert:
rate(http_requests_total[10m]) = 0
What’s next?
In this article, we learned how Prometheus stores data, looked at examples of PromQL queries for fetching and aggregating data.
You can download See the PromQL Cheatsheet to learn more about PromQL operators and functions. You can also check all examples from the article and Cheatsheet with our service Prometheus playground…