Monitoring NATS JetStream in Grafana
Hello, my name is Alexander, I am a backend developer. In this publication I would like to share my experience in setting up NATS JetStream monitoring. Tell us why this might be needed in principle. And also give an example of the necessary stack of services raised in docker for monitoring. The article does not discuss the settings of dashboards in Grafana, the principles and features of NATS.
The question may arise as to why monitoring is required, especially for a software product that, in principle, is operational. Because monitoring is not free – it requires setting up and maintaining several services. However, to optimize and add new features, you need to understand how the application functionality is actually used. And it is metrics that can help us with this. A metric is a numerical measure of some property or behavior of software. Unlike logs, metrics do not collect all the details, but only the finished extract: for example, the number of requests to the service.
A simple example: there is an endpoint that runs for, say, 10 seconds, and at first glance, it seems that you can try to allocate resources to optimize it. But, collecting statistics for a month shows that this endpoint was practically not used. Therefore, a logical question arises: is it worth spending resources on optimizing this endpoint?
First, a few words about NATS JetStream. NATS is a message queuing system that appeared in 2010, written in Go. NATS is well suited for real-time messaging. Today, NATS JetStream features durability and guaranteed message delivery.
The following stack will be used to monitor NATS:
NATS JetStream
Prometheus nats exporter – a service that acts as a kind of adapter between NATS and the Prometheus service
Prometheus is a separate service for collecting telemetry
Grafana – for directly displaying results
To run in docker you can use the following docker-compose file
version: "3.8"
services:
nats:
image: nats:latest
command: --js --debug --trace --sd /data -p 4222 -m 8222
ports:
- 4222:4222
- 6222:6222
- 8222:8222
volumes:
- ./jetstream-cluster/n1:/data
prometheus-nats-exporter:
image: natsio/prometheus-nats-exporter:latest
command: "-connz -varz -channelz -serverz -subz -healthz -routez http://host.docker.internal:8222"
ports:
- "7777:7777"
prometheus:
image: prom/prometheus:latest
hostname: prometheus
volumes:
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
ports:
- "9090:9090"
grafana:
image: grafana/grafana
hostname: grafana
ports:
- "3000:3000"
A little more detail about the settings of these services.
In NATS, you need to enable monitoring on the desired port. This can be done using the -m
NATS supports various types of metrics. The following can be distinguished:
varz – general statistics
connz – connection statistics
routez – information about routes
subsz – information about subscribers
jsz – information about streams (jetstreams)
The results displayed in these metrics can be viewed in the NATS service at the following path:http://localhost:8222/
Next, in the NATS Prometheus exporter configuration file, you need to specify the data source, i.e. NATS service (port specified in the -m parameter), and also specify which metrics need to be collected. This is done using the command
prometheus-nats-exporter -varz "http://localhost:5555
where the necessary metrics are indicated through a hyphen.
The final result of the obtained metrics from Prometheus nats exporter can be seen along the path http://localhost:7777/metrics. This path can be overridden using the -path parameter.
Metrics have the following form: first comes its description and type, and then the metric itself and its value in the format
Label content can be used in Grafana to group/sort or display results. For example, the metric “CPU resource consumption (CPU, %)”
gnatsd_varz_cpu{server_id=
For more details about the available metrics, see official documentation.
The Prometheus service collects telemetry, it has a list of data sources, and it periodically polls them. To launch Prometheus you can use the file prometheus.yml
In the targets block: [‘host.docker.internal:7777’]) the data source (host and port) is indicated, which, in this case, is prometheus nats exporter. However, when running in docker, you cannot specify localhost. Because inside the docker environment, prometheus nats exporter does not have access to the open ports of the host machine. You need to use either http://host.docker.internal or an address with the container name. Additionally, you can read about setting up prometheus in the article.
Next, you can check the availability of the prometheus nats exporter data source. All available data sources for prometheus are displayed at http://localhost:9090/targets
Finally, we need to configure the display of the metrics we are interested in in Grafana, and we can proceed directly to monitoring.
As an example of how metrics work, the following situation will be considered: 2 messages are published into a new stream, and after a few minutes a Subscriber appears reading this stream.
To send a message, use the nats_cli utility; the sending command looks like:
nats publish
The time of appearance of Subscriber on the graphs is indicated by a vertical line. The graph of the number of Subscribers (metric gnatsd_connz_total) looks like this:
The graph of unprocessed messages (metric jetstream_consumer_num_pending) shows that when Subscriber appears, the number of unprocessed messages becomes zero.
At the same time, no changes occur in the graph of the total number of messages (metric jetstream_server_total_messages).
The following important NATS metrics can be identified.
gnatsd_varz_cpu – CPU resource consumption (CPU, %)
gnatsd_varz_mem – RAM consumption
gnatsd_connz_total – Number of Subscribers
jetstream_server_total_messages – Total messages in jetstream
jetstream_server_total_message_bytes – Size of all messages in jetstream
jetstream_consumer_num_pending – Pending messages in jetstream
gnatsd_varz_in_msgs – Incoming messages from Publisher in NATS
gnatsd_varz_out_msgs – Replies to messages from Subscriber in NATS
All these metrics can be displayed on a separate screen, which will allow you to quickly analyze the current state of the entire NATS JetStream service.
A wide range of different metrics are available in NATS. The necessary metrics should be selected based on the specific situation and depending on the purpose of monitoring.