How to build a Latency graph from Prometheus metrics

Usually we connect the collection of metrics in prometheus to our web applications using some client libraries that send metrics to /metrics. In this article, I want to tell you how to visualize Latency using Histogram metric. It will be useful for those who have not yet built metrics from Prometheus, as well as for those who want to understand how to interpret them.

4 golden signals - https://www.ibm.com/garage/method/practices/manage/golden-signals/

In the picture “four golden signals” (four golden signals) are a set of metrics that Google recommends tracking in the SRE (Site Reliability Engineering) approach. These are latency, traffic, errors and saturation. Monitoring these “four golden signals” can help engineers quickly respond to system performance and reliability issues, as well as help identify bottlenecks and plan for scaling.

Prometheus

Prometheus is an open monitoring and alerting system that was developed by SoundCloud. In a distributed systems environment, Prometheus is widely used to collect and visualize real-time metrics.

Prometheus architecture - https://prometheus.io/docs/introduction/overview/

Prometheus collects and stores metrics as time series data, each time series is identified by a name and a set of labels (pairs, key, value, etc.).

# Пример метрики с /metrics
request_latency_seconds_bucket{le="0.1"} 32.0
What is a metric

Metrics — a quantitative indicator used to measure and evaluate the performance of the system.

Collection of metrics. Prometheus uses pull model to collect metrics, which assumes that Prometheus itself initiates the process of collecting information from the services it monitors. This distinguishes him from push models, where services independently send their metrics to the monitoring server. However, prometheus has a push-gateway.

Latency (or latency) – the time it takes to process a request. In the context of web services, this is typically the time taken to process an HTTP request. Latency monitoring is important as it directly affects the user experience. It is important not only to track the average latency, but also to study its distribution in order to understand how the system behaves under different load conditions.

Web application on fast-api

To demonstrate, let’s create a simple application, Grafana is located on: 3000 port, login / password is admin / admin. You need to add a data source – http://prometheus:9090.

Application code for 28 lines 🙂
import time
from typing import Union

from prometheus_client import Histogram, make_asgi_app
from fastapi import FastAPI

app = FastAPI()

metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

REQUEST_LATENCY = Histogram(
    name="request_latency_seconds",
    documentation='Time spent processing a request',
    buckets=(.1, .2, .3, .4, .5)
)


@app.get("/")
def read_root():
    return {"Hello": "World"}


@app.get("/items/{item_id}")
@REQUEST_LATENCY.time()
def read_item(item_id: int, q: Union[str, None] = None):
    time.sleep(item_id / 100)
    return {"item_id": item_id, "q": q}
Dockerfile for application
FROM python:3.10

WORKDIR /code

COPY ./requirements.txt /code/requirements.txt

RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt

COPY ./app /code/app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
docker-compose
version: '3.9'

services:
  web:
    build: .
    ports:
      - 8000:8000

  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - 9090:9090
    depends_on:
      - web

  grafana:
    image: grafana/grafana
    ports:
      - 3000:3000
    depends_on:
      - prometheus

How to collect metrics to build Latency

Prometheus collects latency data using observers in client libraries. There are four main types of observers in the world of Prometheus: Counter, Gauge, Summary and Histogram. Histogram is commonly used to measure latency.

Histogram counts observations (samples), such as HTTP requests, which it sorts into custom segments (buckets) by duration. It also counts the total number of observations and the sum of all observed values.

For example, you can define segments as follows: 0.1, 0.2, 0.3, 0.4, 0.5 seconds. Each time Prometheus receives a new latency value, it increments the counter for the corresponding segment.

How to choose a segment distribution

For example, in the prometheus_client client, the following value is offered:

DEFAULT_BUCKETS = (.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5, 5.0, 7.5, 10.0, INF)

Default segments don’t always make sense for your application. I would recommend choosing segments relative to SLO (target service level). As well as several segments above/below SLO.

For example, if the SLO for microservices is around 500ms. Too small segments (5ms, 10ms, 25ms, 50ms) and too large segments (2500ms, 5000ms, 7500ms, 10000ms) are useless.

Histogram consists of three elements:

  • _countcounting the number of observations;

  • _sumthe sum of all observed values;

  • set of segments _bucket labeled le (less than or equal to) that contain the number of cases whose value is less than or equal to the numeric value contained in the segment.

There were two requests for 30 ms - data from /metrics

There were two requests for 30 ms – data from /metrics

Segment le="0.1" indicates that 2 requests completed in 0.1 second or faster.

Made another request for 468 ms - data from /metrics

Made another request for 468 ms – data from /metrics

The frequencies for each individual segment reported by the Prometheus client are cumulative. This means that the counter for the segment le="0,4" also includes a counter for segments le="0,3", le="0,2", le="0,1".

Histogram can be used to calculate quantiles (percentiles).

What is a percentile

quantile is the value that the given random variable does not exceed with a fixed probability. If the probability is given as a percentage, then the quantile is called a percentile or percentile. If 25% of all observations lie below or equal to a value, we can say that the value is the 25th percentile and this value can be represented as p25.

Percentile(Percentile) is a statistical measure that shows what percentage of observations in a group does not exceed a certain value. Percentiles are used in statistics to give a better understanding of the distribution of observations.

In the context of latency, percentiles are commonly used to describe the distribution of a system’s response time.

Consider an example. Suppose you have a web server and you have measured the server response time for 1000 requests. You recorded these response times and sorted them in ascending order.

Let the 50th percentile (or median) be 200 milliseconds. This means that 50% of all requests were processed in 200 milliseconds or faster.

If the 95th percentile is 500 milliseconds, this means that 95% of all requests were processed in 500 milliseconds or faster, and 5% of requests took longer to process.

The 99th percentile, which could be 800 milliseconds, for example, shows that 99% of all requests were processed in 800 milliseconds or faster, and the slowest 1% of requests took longer to process.

Thus, percentiles provide an important insight into how a system behaves not only on average, but also in extreme conditions, allowing you to determine both the normal operation of the system and its behavior under peak load conditions.

Execute requests to the web application

Execute requests to the web application

Get these metrics with /metrics

How to plot Latency in Grafana

For the Latency graph, we will use the 95% percentile, you can also build other percentiles. To render in Grafana, you need to execute a PromQL (Prometheus Query Language) query:

histogram_quantile(
    0.95,
    sum(
      rate(
        request_latency_seconds_bucket[1m]
      )
    ) by (le)
)

request_latency_seconds_bucket — time series from Prometheus.

rate – used to calculate the average rate of change for time series in a specified period of time, how often a certain event occurs during a certain time. Roughly speaking, like a derivative in mathematics.

Average rate of change in segments

Average rate of change in segments

sum is an aggregation operator that sums values ​​over a group of elements. Used to get the total sum of values ​​for all eligible time series.

To summarize, nothing has changed in our case :)

To summarize, nothing has changed in our case 🙂

histogram_quantile – looks for a value that does not exceed a certain percentage of all requests. You can interpret it like this: “N% of all requests are processed faster than this value.”

95% of requests are completed faster

95% of requests are completed faster

50% of requests are completed faster

50% of requests are completed faster

Conclusion

To build Latency, you can use the expression:

histogram_quantile(0.95, sum(rate(request_latency_seconds_bucket[1m])) by (le))

This expression gives you the 95th percentile latency of the request in the last minute. This means that 95% of all measured requests in the last minute had a latency that was less than or equal to the calculated value.

Histograms are not an effective tool for accurately measuring latency, especially at segment boundaries. Since if latency has decreased by 20 ms, whether it will be noticeable on the graph depends on the configured segments. This graph can be used in a broad sense for a qualitative assessment of optimization, but not for a quantitative one.

You should not use a lot of segments, since each segment forms its own time series in the Prometheus database.

materials

Article materials

My tg channel where I write notes/thoughts about backend, system design, architecture and engineering

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *