Five Best Practices for Prometheus Exporters to Improve Productivity

On the eve of the start of the kura “DevOps Practices and Tools” We invite you to sign up for a free demo lesson on the topic: “Alice in the Cloud: Adventures with Terraform and Ansible”.
And right now, we are traditionally sharing a useful translation.

The recommendations discussed in this article will help you implement Prometheus monitoring and improve your performance.

Prometheus is one of the basic building blocks of a cloud-native architecture. It has already become the de facto standard for monitoring Kubernetes. However, many third-party and cloud applications do not natively provide metrics in Prometheus format. For example, Linux does not provide such metrics. This is what exporters are intended for, such as node exporter… It’s easy to download, run and get hundreds of metrics for the operating system.

Secure DevOps, also known as DevSecOps, provides security and monitoring throughout the entire application lifecycle, from development to production. That allows you to create secure, stable and high performance applications. This approach integrates into your development process and provides a single source of truth for DevOps teams, developers, and security teams to maximize efficiency and provide transparency in troubleshooting and optimization.

In this article, we’ll go over a few guidelines for monitoring apps and cloud services with Prometheus exporters.

One of our clients recently said: “For each integration, I spend one man-week trying to figure out which versions of exporters, dashboards and Prometheus alerts we should use, how to set them up, and how to keep up with the latest versions.”…

The labor costs of serving a large number of Prometheus exporters are often underestimated and more than you imagine. Sometimes all these exporters and configurations can seem like the Wild West. By thinking ahead and armed with best practices, you can make your life easier and spend less time supporting exporters, focusing more on the things that really matter to your organization.

1. Find the right exporter

Once you start using Prometheus, you will quickly find that there are many exporters available to monitor your applications. And there is a problem of choice.

One way to solve this problem is to evaluate yourself how a particular exporter-candidate satisfies your requirements, provides the metrics you are interested in, and how mature it is as a software product. For example, if the exporter was developed many years ago by one person, it does not have fresh updates and has a small number of PR / issue / stars on Github, then most likely it is not used or supported.

Many aspects of the metrics and best practices of Prometheus exporters are not always obvious, so using curated exporter lists can help guide your selection.

Your first stop should be the page Exporters and Integrations on the Prometheus website. If your application or the protocol it uses to represent metrics is there, then this is the exporter that is likely to be the best choice.

There are also third party exporter directories such as PromCat.iosupported by Sysdig, or default ports pagewhich inadvertently turned out to be a fairly complete list of exporters.

PromCat will help you save the time required for selecting and testing exporters, dashboards and alerts. Sysdig has a team of engineers who maintain this site and constantly check the functionality of its content. See also:

2. Examine the exporter’s metrics

Each exporter has its own set of metrics. Usually they are described on the exporter’s project page, although sometimes you have to look in the help or documentation. If the exporter uses the format OpenMetricsthen it can add fields with additional information such as type, info, unit to the metric.

Another point to pay attention to in the exporter’s documentation is the use of labels.

Labels provide context: “Is this a production service or development environment?”, “What host is the service running on?”, “What application is this service for?” For example, the backend team and the analyst team might have separate MySQL instances. Later, you will want to filter their metrics, categorized by application, environment (production or development), or by region.

In addition to using labels to analyze what is happening inside the application, they are useful when aggregating metrics across all deployed systems. Proper use of labels can help answer questions such as “How many processors are all applications worldwide currently using?” or “What is the total RAM usage of all applications owned by the frontend team in Europe?” You can see examples of using labels in this webinar…

3. Set up really useful alerts

Setting up alerts can be challenging. If you set a low threshold for them, your support will quickly tire of them. On the other hand, if the alerts do not work at the right time, then you may miss important information and this can affect end users.

The first step in defining any alert strategy is to examine your applications and Prometheus exporters. By following DevOps best practices for Service Level Indicators and Service Level Objectives along with monitoring golden signals (golden signals) you can identify critical elements that require alerts. A good monitoring tool with deep visibility and Kubernetes context will help you find these critical factors.

Working with tools that natively use PromQL for alerts can save you time, since you do not need to translate them into another format, which is fraught with errors. For example promtool will help in testing PromQL alert configurations (and not only this).

But the alert tool should have more features than simple PromQL processing. Apart from setting up alerts for any metric or event, it should also be possible to send alerts to email, Slack, Pagerduty, Service Now, etc.

4. Provide data to your team (or not)

Now that you have valuable information from the Prometheus exporters that you use to monitor, make sure all your colleagues can see and use it.

The most common way to interact with metrics is visualization on dashboards. To help create them PromCat.io provides templates ready to be imported into Grafana or Sysdig Monitor.

But how does your team organize dashboards? The best practice is that instead of creating their own individual dashboards for each team member, create a single dashboard that is used by the entire DevOps team. Team members can use it as an example and make only minor changes if necessary. For everything to work, the monitoring tool must also provide the ability to differentiate access rights to dashboards (View Only or Collaborator with editing rights).

Sometimes it is required to restrict access to metrics. If your monitoring tool has full RBAC support, then you can only provide the team with the data they need. For example, developers should only have access to their namespace metrics, while those on duty should have access to all productive nodes.

PromQL is a powerful query language for metrics collected by Prometheus exporters. With PromQL, you can perform complex math operations, statistical analysis, and various functions.

While learning PromQL it may seem like you’ve gotten a head start in monitoring, it has a really steep learning curve that shouldn’t be overlooked. If you want to get new users up and running quickly, then you should make sure that your tool allows you to easily enter data into dashboards via web forms. Also, do not ask new users to write complex PromQL queries with connections and functions. Plus, your tool doesn’t have to be complicated for non-techies who want to create a simple report for data analysis.

5. Make a scaling plan

As you use Prometheus, the number of exporters is constantly growing and problems with visibility, horizontal scaling and long-term storage may arise. The best practice is to plan ahead for scaling metrics.

Let’s take a look at some Prometheus scaling issues and ways to solve them.

Prometheus global visibility (Global Prometheus Visibility): As you grow, you will need to see data across multiple clusters simultaneously.

Horizontal scaling: As your environment grows, the number of services in Kubernetes, the number of metrics, and the memory usage of Prometheus increase. Prometheus is not horizontally scalable by architecture. And when you hit your vertical scaling limit, that’s it.

Long term storage: Prometheus can track millions of measurements in real time. However, the longer you store data, the more resources you spend. You can reduce the storage life of metrics, but then you cannot analyze data for weeks, months, or years.

To deal with scaling issues, you can try consolidating Grafana, deploying Thanos, Cortex, or using a commercial solution like Sysdig. Some of the scaling issues can be addressed by a SaaS solution because it is easier for a SaaS provider to adapt to growth than it is for you. The potential issues associated with these scaling solutions are detailed in the article “Challenges using Prometheus at scale. ” (Prometheus scaling issues).

Conclusion

Prometheus, as one of the basic components of the cloud-native architecture, has become the de facto monitoring standard for Kubernetes. To monitor apps and cloud services, you need exporters. It is important to remember that not all exporters are created equal, and monitoring solutions may not be ready to scale. I hope these tips help you succeed in monitoring by using proven projects, understanding your metrics, and planning ahead for growth.

Choosing the right tool is key. Sysdig allows you to follow the best practices of the Prometheus exporters, and with it you have a turnkey solution in less than five minutes. Our full Prometheus compatibility, out-of-the-box dashboards and long-term storage help lower MTTR while increasing the performance and availability of your environment.

You can see the process of monitoring applications and cloud services in action in our webinar “So Many Metrics, So Little Time: 5 Prometheus Exporter Best Practices“(“ So Many Metrics and So Little Time: Five Best Practices for Prometheus Exporters ”).