Monitoring Kubernetes with Prometheus and Thanos

On the eve of the start of the professional course “Monitoring and Logging: Zabbix, Prometheus, ELK”, have prepared an interesting translation for you, and we also offer you to watch a demo lesson on subject: “Prometheus as a new round of monitoring systems “


Introduction

Congratulations! You managed to convince your bosses to migrate applications to a microservice architecture using containers and Kubernetes

You are very happy and everything is going according to plan. You create your first cluster Kubernetes (all major cloud providers: Azure, AWS and Gcp, – there are simple solutions for provisioning managed or unmanaged Kubernetes), develop your first containerized application, and deploy it to the cluster. It was easy, wasn’t it?

After a while, you realize that things get a little more complicated: you need to deploy multiple applications in a cluster, so you need Ingress Controller… Next, you want to monitor the load, so you start looking for solutions for this and, fortunately, you find Prometheus… Expand it, add Grafana and that’s it!

Later, you start to wonder, “Why Prometheus works with only one replica “? What happens if the container is restarted? What happens with a simple version upgrade? How long Prometheus can store metrics? What if the cluster falls apart? Do I need another cluster for HA and DR? How do I get a single view of metrics from all servers Prometheus?

Well, keep reading, smart people have already figured out these questions.

Typical Kubernetes cluster

The diagram below shows a typical deployment using Kubernetes.

Typical Kubernetes cluster
Typical Kubernetes cluster

Deployment has three levels:

  1. Virtual machines: master nodes and worker nodes.

  2. Kubernetes infrastructure components.

  3. Custom applications.

Within a cluster, components communicate with each other usually via HTTP (s) (REST or gRPC), but some of them provide API access from outside (Ingress). These APIs are mainly used for the following:

  1. Cluster management via Kubernetes API Server.

  2. Interaction with custom applications through the Ingress Controller.

In some scenarios, applications may send traffic outside the cluster to consume third-party services such as Azure SQL, Azure Blob, or any other.

What to monitor?

Kubernetes monitoring should include all three layers mentioned above.

Virtual machines. In order to make sure that virtual machines are healthy, you need to collect the following metrics:

  • Number of nodes.

  • Utilization of resources on a node (processor, memory, disk, network bandwidth).

  • The state of the node (working, not working, etc.).

  • The number of pods running on each node.

Kubernetes infrastructure. In order to make sure that the Kubernetes infrastructure is working, you need to collect the following metrics:

  • The state of the pods is ready (how many replicas are available), status, restarts (the number of restarts), age (how long the application has been running).

  • Deployments status – desired (target number of replicas), current (current number of replicas), up-to-date (how many replicas have been updated), available (how many replicas are available), age (how long the application has been running).

  • StatefulSets status.

  • CronJobs execution statistics.

  • Pod resource utilization (processor and memory).

  • Health checks.

  • Kubernetes events.

  • API server requests.

  • Etcd statistics.

  • Mounted volume statistics.

Custom applications. Each application should provide its own metrics based on its functionality. However, there are common metrics for most applications, such as:

  • HTTP requests (total, latency, response code, etc.)

  • The number of outgoing connections (for example, to a database).

  • Number of threads.

Collecting the metrics mentioned above will allow you to create meaningful alerts and dashboards, which we’ll talk about briefly next.

Thanos

Thanos Is an open source project that you can build on highly available system collecting metrics from unlimited size storage that seamlessly integrates with existing Prometheus instances.

For storing historical data Thanos uses storage format Prometheus and can store metrics in any object storage. In addition, it provides global view for all instances Prometheus

Main components Thanos:

  • Sidecar… Connects to Prometheus and uses it for real-time queries through the Query Gateway and / or uploads its data to cloud storage for long-term storage.

  • Query Gateway… Implements the Prometheus API for aggregating data from underlying components (such as Sidecar or Store Gateway).

  • Store Gateway… Provides access to cloud storage content.

  • Compactor… Compaction and downsampling of data in the cloud storage.

  • Receiver… Receives data from remote-write WAL Prometheus, provides it and / or uploads it to cloud storage.

  • Ruler… Calculates recording rules and alerting rules for data in Thanos.

In this article, we will focus on the first three components.

Deploy Thanos

We’ll start by deploying Thanos sidecar in the same clusters Kubernetesthat we use for custom applications, Prometheus and Grafana

There are many ways to install Prometheusbut I prefer to use Prometheus-Operatorwhich provides settings for monitoring services and deployments Kubernetesand also for managing instances Prometheus

The easiest way to install Prometheus-Operator Is to use it Helm chartwhich has built-in high availability support, Thanos sidecar and many pre-configured alerts for monitoring cluster virtual machines, infrastructure Kubernetes and your applications.

Before deployment Thanos sidecar we need Kubernetes Secret with information on how to connect to cloud storage.

For demonstration, I will use Microsoft Azure

Create an account for the blob storage:

az storage account create --name <storage_name> --resource-group <resource_group> --location <location> --sku Standard_LRS --encryption blob

Then create a folder (aka container) for metrics:

az storage container create --account-name <storage_name> --name thanos

Get the keys to the vault:

az storage account keys list -g <resource_group> -n <storage_name>

Create a storage configuration file (thanos-storage-config.yaml):

Create Kubernetes Secret:

kubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml

Create a prometheus-operator-values.yaml file in which override the default settings for Prometheus-Operator.

And deploy:

helm install --namespace monitoring --name prometheus-operator stable/prometheus-operator -f prometheus-operator-values.yaml

You should now have highly available Prometheus together with Thanos sidecarwhich loads your metrics into Azure Blob Storage with unlimited shelf life.

In order to Thanos Store Gateway provide access to Thanos sidecar, you need to expose it outside through Ingress… I use Nginx Ingress Controllerbut you can use any other Ingress Controllerwhich supports gRPC (perhaps, Envoy would be the best option).

For a secure connection between Thanos Store Gateway and Thanos sidecar we will use mutual TLS… That is, the client will authenticate the server and vice versa.

if you have .pfx filethen you can extract from it private, public key and certificate through openssl:

# public key
openssl pkcs12 -in cert.pfx -nocerts -nodes | sed -ne '/-BEGIN PRIVATE KEY-/,/-END PRIVATE KEY-/p' > cert.key
# private key
openssl pkcs12 -in cert.pfx -clcerts -nokeys | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > cert.cer
# certificate authority (CA)
openssl pkcs12 -in cert.pfx -cacerts -nokeys -chain | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > cacerts.cer

Create two of this Kubernetes Secrets

# a secret to be used for TLS termination
kubectl create secret tls -n monitoring thanos-ingress-secret --key ./cert.key --cert ./cert.cer
# a secret to be used for client authenticating using the same CA
kubectl create secret generic -n monitoring thanos-ca-secret --from-file=ca.crt=./cacerts.cer

Make sure you have a domain that resolves to your cluster Kubernetes, and create two subdomains that will be used to route to each of Thaos sidecar:

thanos-0.your.domain
thanos-1.your.domain

Now we can create Ingress rules (change the hostname):

now we have protected Access to Thanos Sidecars outside the cluster!

Thanos cluster

In the deployment diagram Thanosabove, you may have noticed that I decided to expand Thanos in a separate cluster. This is because I need a dedicated cluster that can be easily recreated if needed and give engineers access to it rather than production.

To deploy components Thanos i decided to use this Helm chart (it is not official yet, but stay tuned when my PR).

Create a file thanos-values.yamlto override the default chart settings.

Since for Thanos Store Gateway access to blob storage is required, we will also create a storage secret in this cluster.

kubectl -n thanos create secret generic thanos-objstore-config --from-file=thanos.yaml=thanos-storage-config.yaml

To expand this chart, we’ll take the same certificates we used earlier.

helm install --name thanos --namespace thanos ./thanos -f thanos-values.yaml --set-file query.tlsClient.cert=cert.cer --set-file query.tlsClient.key=cert.key --set-file query.tlsClient.ca=cacerts.cer --set-file store.tlsServer.cert=cert.cer --set-file store.tlsServer.key=cert.key --set-file store.tlsServer.ca=cacerts.cer

This command will install Thanos Query Gateway and Thanos Storage Gatewayby configuring them to use a secure channel.

Validation

To check if everything is working correctly, you can forward the port to the HTTP service Thanos Query Gateway in the following way:

kubectl -n thanos port-forward svc/thanos-query-http 8080:10902

After that, open your browser at http: // localhost: 8080and you should see Thanos UI!

Grafana

For dashboards, you can install Grafana using the Helm chart.

Create grafana-values.yaml with the following content:

Note that I have added three default dashboards to it. You can also add your own dashboards (the easiest way is to use ConfigMap).

Then deploy:

helm install --name grafana --namespace thanos stable/grafana -f grafana-values.yaml

And again port-forward:

kubectl -n thanos port-forward svc/grafana 8080:80

And … that’s it! You deployed based on Prometheus highly available monitoring solution with long-term storage and centralized representing multiple clusters!

Other options

This article is about Prometheus and Thanosbut if you don’t need a global view for multiple clusters, then you can only use Prometheus with persistent storage.

Another option is Cortex Is another open source platform that is a little more complex than Thanos, and takes a different approach.


Is it interesting to develop in this direction? Sign up for free master class, within the framework of which OTUS experts-teachers will tell in detail about the training program and answer questions of interest.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *