Monitoring a Rust Web Application Using Prometheus and Grafana

This article will show you how to set up monitoring for a web application in Rust. The application will expose metrics outside Prometheus, which will be visualized using Grafana. Monitoring is done for the application mongodb-redis democonsidered in detail here… As a result, the following architecture will be obtained:

architecture

The monitoring system includes:

  • Prometheus – a monitoring platform that aggregates metrics in real time and saves them to a time series database (time series database).
  • Grafana – platform for analysis and visualization of metrics
  • AlertManager – an application that processes notifications (alerts) sent by the Prometheus server (for example, when something goes wrong in your application), and notifies the user via email, Slack, Telegram, etc.
  • cAdvisor – a platform for users using containerization that provides data on resource usage and performance parameters of running containers. (Actually, cAdvisor collects data from all Docker containers in the diagram)

To run all of these tools, you can use the following:

Docker Compose file

version: '3.8'
services:

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: always
    ports:
      - '9090:9090'
    volumes:
      - ./monitoring/prometheus:/etc/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--web.external-url=http://localhost:9090'

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: always
    ports:
      - '3000:3000'
    volumes:
      - ./monitoring/grafana/data:/var/lib/grafana
      - ./monitoring/grafana/provisioning:/etc/grafana/provisioning
    environment:
      GF_SECURITY_ADMIN_USER: admin
      GF_SECURITY_ADMIN_PASSWORD: admin

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    ports:
      - '9093:9093'
    volumes:
      - ./monitoring/alertmanager:/etc/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--web.external-url=http://localhost:9093'

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: always
    ports:
      - '8080:8080'
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro

Prometheus return of Rust metrics by an application

The return of metrics is implemented using crate prometheus

Exists four types Prometheus metrics: counter, gauge / meter / scale, bar graph, summary. The use of the first three of them will be described in the article (currently, the crate do not support summaries).

Creating metrics

Metrics can be created and registered as follows:

Creating and registering metrics

lazy_static! {
    pub static ref HTTP_REQUESTS_TOTAL: IntCounterVec = register_int_counter_vec!(
        opts!("http_requests_total", "HTTP requests total"),
        &["method", "path"]
    )
    .expect("Can't create a metric");
    pub static ref HTTP_CONNECTED_SSE_CLIENTS: IntGauge =
        register_int_gauge!(opts!("http_connected_sse_clients", "Connected SSE clients"))
            .expect("Can't create a metric");
    pub static ref HTTP_RESPONSE_TIME_SECONDS: HistogramVec = register_histogram_vec!(
        "http_response_time_seconds",
        "HTTP response times",
        &["method", "path"],
        HTTP_RESPONSE_TIME_CUSTOM_BUCKETS.to_vec()
    )
    .expect("Can't create a metric");
}

The above code example adds metrics to the registry by default. It is also possible to register them in a custom registry (example).

Counter

If it is required to count all incoming HTTP requests, then it is possible to use the type IntCounter… But it is more useful to see not only the total number of requests, but also some additional dimensions such as path and HTTP method. This can be done with IntCounterVec; metrics HTTP_REQUESTS_TOTAL of this type is used in custom Actix middleware in this way:

Usage metrics HTTP_REQUESTS_TOTAL

let request_path = req.path();
let is_registered_resource = req.resource_map().has_resource(request_path);
// this check prevents possible DoS attacks that can be done by flooding the application
// using requests to different unregistered paths. That can cause high memory consumption
// of the application and Prometheus server and also overflow Prometheus's TSDB
if is_registered_resource {
    let request_method = req.method().to_string();
    metrics::HTTP_REQUESTS_TOTAL
        .with_label_values(&[&request_method, request_path])
        .inc();
}

After you make a few API requests, something similar to:

Output of the metric HTTP_REQUESTS_TOTAL

# HELP http_requests_total HTTP requests total
# TYPE http_requests_total counter
http_requests_total{method="GET",path="/"} 1
http_requests_total{method="GET",path="/events"} 1
http_requests_total{method="GET",path="/metrics"} 22
http_requests_total{method="GET",path="/planets"} 20634

Each metric sample contains labels (metric attributes) method and path, which allows the Prometheus server to distinguish between samples.

As shown in the snippet above, requests to GET /metrics (the endpoint from which the Prometheus server collects application metrics) are also taken into account.

Sensor

The sensor differs from the counter in that its value can decrease. The sensor example shows how many clients are currently connected using SSE. Used like this:

Usage metrics HTTP_CONNECTED_SSE_CLIENTS

crate::metrics::HTTP_CONNECTED_SSE_CLIENTS.inc();

crate::metrics::HTTP_CONNECTED_SSE_CLIENTS.set(broadcaster_mutex.clients.len() as i64)

When switching to http://localhost:9000 an SSE connection will be established in the browser, which increments the metric value. After that, the output will be like this:

Metric Output HTTP_CONNECTED_SSE_CLIENTS

# HELP http_connected_sse_clients Connected SSE clients
# TYPE http_connected_sse_clients gauge
http_connected_sse_clients 1

Broadcaster

To implement the SSE client number sensor, it was necessary to refactor the application code and realization broadcaster. It saves all connected (with functions sse) clients into a vector and periodically pings them (using functions remove_stale_clients) to ensure the connection is still active, otherwise removes the disconnected clients from the vector. Broadcaster allows you to open just one Redis Pub / Sub compound; messages from it are sent (broadcasted) to all clients.

bar graph

In this guide bar graph used to collect response time data. As in the case of the request counter, tracking is carried out in the Actix middleware; this is implemented by the following code:

Tracking response time

...

histogram_timer = Some(
    metrics::HTTP_RESPONSE_TIME_SECONDS
        .with_label_values(&[&request_method, request_path])
        .start_timer(),
);

...

if let Some(histogram_timer) = histogram_timer {
    histogram_timer.observe_duration();
};

I suppose this method is not very accurate (the question is how much the measured response time is less than the real one), but nevertheless the observation data will be useful as an example of a histogram and for its further visualization in Grafana.

The histogram takes observation results and counts their number in configurable buckets (there are default buckets, but most likely you will need to define custom buckets suitable for your use case); to configure them, it would be nice to know the approximate distribution of the values ​​of a certain metric. In this application, response times are short, so the following configuration is used:

Buckets for response time metrics

const HTTP_RESPONSE_TIME_CUSTOM_BUCKETS: &[f64; 14] = &[
    0.0005, 0.0008, 0.00085, 0.0009, 0.00095, 0.001, 0.00105, 0.0011, 0.00115, 0.0012, 0.0015,
    0.002, 0.003, 1.0,
];

The output will look something like this (only part of the data is shown):

Metric Output HTTP_RESPONSE_TIME_SECONDS

# HELP http_response_time_seconds HTTP response times
# TYPE http_response_time_seconds histogram
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.0005"} 0
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.0008"} 6
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.00085"} 1307
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.0009"} 10848
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.00095"} 22334
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.001"} 31698
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.00105"} 38973
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.0011"} 44619
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.00115"} 48707
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.0012"} 51495
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.0015"} 57066
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.002"} 59542
http_response_time_seconds_bucket{method="GET",path="/planets",le="0.003"} 60532
http_response_time_seconds_bucket{method="GET",path="/planets",le="1"} 60901
http_response_time_seconds_bucket{method="GET",path="/planets",le="+Inf"} 60901
http_response_time_seconds_sum{method="GET",path="/planets"} 66.43133770000004
http_response_time_seconds_count{method="GET",path="/planets"} 60901

The data shows the number of observations hitting certain buckets. It also provides data on the total number and amount of observations.

System metrics

process the feature allows you to export process metricssuch as CPU or memory usage. To do this, you need to specify the feature in Cargo.toml… After that, you get something like:

Process metrics output

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 134.49
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1048576
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 37
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 15601664
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1636309802.38
# HELP process_threads Number of OS threads in the process.
# TYPE process_threads gauge
process_threads 6
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 439435264

Please note that the crate prometheus supports the export of process metrics in applications running on Linux (for example, in such Docker container).

Endpoint for giving metrics

Actix is ​​configured to process a request to GET /metrics using the following handler:

Handler for metrics

pub async fn metrics() -> Result<HttpResponse, CustomError> {
    let encoder = TextEncoder::new();
    let mut buffer = vec![];
    encoder
        .encode(&prometheus::gather(), &mut buffer)
        .expect("Failed to encode metrics");

    let response = String::from_utf8(buffer.clone()).expect("Failed to convert bytes to string");
    buffer.clear();

    Ok(HttpResponse::Ok()
        .insert_header(header::ContentType(mime::TEXT_PLAIN))
        .body(response))
}

Now, after successfully configuring the application, you can get all the previously described metrics by running the query GET http://localhost:9000/metrics… This endpoint is used by the Prometheus server to get application metrics.

Metrics are given in plain text format

Configuring Prometheus to Collect Metrics

Prometheus collects metrics using the following config:

Config Prometheus for collecting metrics

scrape_configs:

  - job_name: mongodb_redis_web_app
    scrape_interval: 5s
    static_configs:
      - targets: ['host.docker.internal:9000']

  - job_name: cadvisor
    scrape_interval: 5s
    static_configs:
      - targets: ['cadvisor:8080']

There are two jobs defined in the config. The first collects the previously described application metrics, the second collects the resource usage and performance metrics of running containers (this will be discussed in detail in the section describing the use of cAdvisor). scrape_interval specifies how often to fetch data from target. Parameter metrics_path is not specified, so Prometheus expects metrics to be available on targets along the way /metrics

Expression browser and graphical interface

To use the built-in Prometheus expression browser go to http://localhost:9090/graph and try to query any of the previously described metrics, for example, http_requests_total… Use the “Graph” tab to visualize the data.

PromQL allows you to make more complex queries; let’s look at a couple of examples.

  • return all time series for a metric http_requests_total and the given job:

    http_requests_total{job="mongodb_redis_web_app"}

    Labels job and instance automatically added to the time series collected by the Prometheus server.

  • return using function rate RPS based on measurements in the last 5 minutes:

    rate(http_requests_total[5m])

You can find more examples here

Configuring Grafana to visualize metrics

In this project, Grafana is configured with the following parameters:

  • data sources (where Grafana will request data from)

    Data sources config for Grafana

    apiVersion: 1
    
    datasources:
      - name: Prometheus
        type: prometheus
        access: proxy
        url: prometheus:9090
        isDefault: true

  • dashboard provider (from where Grafana will load the dashboards)

    Dashboards config for Grafana

    apiVersion: 1
    
    providers:
      - name: 'default'
        folder: 'default'
        type: file
        allowUiUpdates: true
        updateIntervalSeconds: 30
        options:
          path: /etc/grafana/provisioning/dashboards
          foldersFromFilesStructure: true

After starting the project with Docker Compose file you can go to http://localhost:3000/, log in with admin/admin and find a dashboard webapp_metrics… After a while, it will look something like this:

grafana

The dashboard shows the state of the application under a simple load test. (If you run any load test, then for greater clarity of the graphs (especially the histogram) you will need to disable limitation MAX_REQUESTS_PER_MINUTE, for example, by dramatically increasing this number.)

To visualize data in dashboard PromQL queries are used that include the previously shown metrics, for example:

  • rate(http_response_time_seconds_sum[5m]) / rate(http_response_time_seconds_count[5m])

    Show the average response time over the last 5 minutes

  • sum(increase(http_response_time_seconds_bucket{path="/planets"}[30s])) by (le)

    Used to visualize the distribution of response time in the form heat map… A heat map is similar to a histogram but over time; each time interval is a separate histogram:

  • rate(process_cpu_seconds_total{job="mongodb_redis_web_app"}[1m]), sum(rate(container_cpu_usage_seconds_total{name="mongodb-redis"}[1m])) by (name)

    Shows the CPU usage for the last 5 minutes. The requested data comes from two sources and shows the resource utilization by the process and container, respectively. The two graphs are almost the same. (sum used because container_cpu_usage_seconds_total provides information on using each kernel.)

Note: The “Memory usage” graph shows the memory used:

  • process (process_resident_memory_bytes{job="mongodb_redis_web_app"} / 1024 / 1024)
  • container (container_memory_usage_bytes{name="mongodb-redis"} / 1024 / 1024)

For some reason unknown to me, the graph shows that the process consumes much more memory than the entire container. I created issue on this issue. Please write if there is something wrong in this comparison or if you know how it is explained.

Monitoring application container metrics with cAdvisor

In addition to the system metrics of the process (shown earlier), the system metrics of the Docker application container can also be exported. This can be done using cAdvisor.

CAdvisor web interface is available by http://localhost:8080/… All running Docker containers are shown in http://localhost:8080/docker/:

cadvisor docker containers

You can get information on the use of resources by any container:

cadvisor container info

Metrics are collected by the Prometheus server from http://localhost:8080/metrics

Metrics exported by cAdvisor are listed here

Server system metrics can be exported using Node exporter or Windows exporter

Configuring Notifications Using Rules and AlertManager

In this project, the following part of the Prometheus config is responsible for notifications:

Config Prometheus for notifications

rule_files:
  - 'rules.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

In chapter alerting An AlertManager instance is defined with which the Prometheus server interacts.

Alert rules allow you to define conditions based on PromQL expressions:

Example notification rules in rules.yml

groups:
- name: default
  rules:
  - alert: SseClients
    expr:  http_connected_sse_clients > 0
    for: 1m
    labels:
      severity: high
    annotations:
      summary: Too many SSE clients

  • alert – the name of the alert
  • expr – rule definition as a Prometheus expression
  • for – how long the rule should be broken before sending an alert. In our case, if the number of SSE clients is more than 0 within 1 minute, an alert will be sent
  • labels – additional information that can be added to the alert, such as the severity of the problem
  • annotations – additional information that can be added to the notification

The specified rule that the number of SSE clients is greater than 0 is not something you usually configure in your application. It is used as an example only because it makes it easy to break it with just one request.

If the rule is violated, the Prometheus server will send an alert to the AlertManager instance, which provides many features such as alert deduplication, grouping, disabling and routing of end-user notifications. Here we will only consider routing: the notification will be redirected to email.

The AlertManager is configured like this:

AlertManager config

route:
  receiver: gmail

receivers:
- name: gmail
  email_configs:
  - to: recipient@gmail.com
    from: email_id@gmail.com
    smarthost: smtp.gmail.com:587
    auth_username: email_id@gmail.com
    auth_identity: email_id@gmail.com
    auth_password: password

In this project, the AlertManager is configured with a Gmail account. To generate app password you can use this guide

To make the alert rule SseClients worked, you need to go to http://localhost:9000 in the browser. This will increase the value of the metric. http_connected_sse_clients to 1. You can see the changes in the status of the notification SseClients on the http://localhost:9090/alerts… After triggering, the notification will change to the status Pending… After the interval fordefined in rules.yml (in our case it is 1 minute), the notification will change to the status Firing

prometheus alert

This will cause the Prometheus server to send an alert to the AlertManager, which will determine what to do with it. In our case, an email will be sent:

gmail alert

Monitoring third-party systems with Prometheus exporters

For third-party bodies such as MongoDB, Redis and many others, it is possible to configure monitoring using Prometheus exporters

Running

docker compose up --build

Conclusion

This article showed how to set up the delivery of metrics for a web application in Rust, their collection by Prometheus and data visualization using Grafana. It also showed how to get started with cAdvisor to collect container metrics and notifications using AlertManager. Feel free to email me if you find any errors in the article or in the source code. Thank you for your attention!

useful links

Similar Posts

Leave a Reply