How to make simple metrics to evaluate network bandwidth?

Why is this necessary?

Often, to solve various problems, you have to use the services of cloud providers to rent a VPS (Virtual Private Server). Most often, providers of cheap VPS servers do not guarantee network bandwidth in any way. However, this usually does not cause any inconvenience, especially if your project is not very demanding on Internet speed.

In my case, high bandwidth is important for my project. And high network speed is important to me throughout the entire life cycle of the server. Currently I use the services of two providers, I will not name them (let's call them provider A and provider B).

When I rent a new server, to check the network speed, I remotely log into the server and do a network speed test using the speedtest command utility. I see high internet speeds and I think this speed will be there most of the time. However, after introducing bandwidth metrics, I was very surprised.

Metrics

Everything will work as follows. Prometheus will run speedtest-exporter at some intervals to obtain data on Internet speed and save this data. Grafana will pull data from Prometheus and display it.

1. Install speedtest-exporter

My VPS servers are located in a Kubernetes cluster. I already have them installed in the cluster Grafana And Prometheus. For the speedtest helm chart I took the Docker image as a basis billimek/prometheus-speedtest-exporter. In my github repositories you can find helm chart for speedtest-exporter deployment.

You can run these commands to install helm chart. In helm chart I used Daemonset in order for an instance of speedtest-exporter to rise on each node of the cluster.

helm repo add tarmalonchik https://tarmalonchik.github.io/charts/charts
helm install mychart tarmalonchik/speedtest-prometheus

2. Prometheus Configuration

After the pods are up and running, you need to add the config to Prometheus. In order for it to collect metrics from new pods.

I used this helm-chart to install Prometheus on my kubernetes cluster. This helm-chart comes with node-exporter (a useful thing that collects a lot of useful metrics).

In order for Prometheus to start requesting data from speedtest-exporter, you need to add this job to values.yaml helm-chart Prometheus. You need to add the job in scrape_configs.

- job_name: 'speedtest'
  metrics_path: /probe
  params:
    script: [ speedtest ]
  static_configs:
    - targets:
        - speedtest-exporter.default.svc.cluster.local:9469
  scrape_interval: 10m
  scrape_timeout: 10m
  kubernetes_sd_configs:
    - role: pod
      namespaces:
        names:
          - default
      selectors:
        - role: "pod"
          label: "app.kubernetes.io/name=speedtest-exporter"
  relabel_configs:
    - source_labels: [ __meta_kubernetes_pod_node_name ]
      action: replace
      target_label: node
  1. scrape_interval tells Prometheus how often to measure. It’s also not worth running speedtest too often, since the measurement loads the network and CPU.

  2. kubernetes_sd_configs are needed so that Prometheus measures the speed every time for each pod, and not just any one it comes across.

  3. relabel_configs are needed to attach a node label to the data, to make it more convenient to view the data in Grafana.

3. Grafana metrics

To run Grafana in a kubernetes cluster, I used this helm-chart. After launching Grafana, you need to specify Prometheus in it as a source of metrics.

Now in Grafana you need to add a display of metrics that are already collected. Just add a panel with the following content.

speedtest_download_bytes{node="$node"}*8 // Для downlink скорости
-speedtest_upload_bytes{node="$node"}*8 // Для uplink скорости

Result

These are the graphs we got.

Provider A (server in Germany). I limited the display of everything above 1Gb/s on the graphs so that small values ​​on the graphs are easier to read. Everything looks pretty good on this graph. We can say that the server almost always has the required speed of 1Gb/s, with the exception of a small drawdown at the very beginning of the graph.

Provider A (server in Germany). I limited the display of everything above 1Gb/s on the graphs so that small values ​​on the graphs are easier to read. Everything looks pretty good on this graph. We can say that the server almost always has the required speed of 1Gb/s, with the exception of a small drawdown at the very beginning of the graph.

Provider A (server in Germany). This server has quite severe network problems. The minimum network speeds reach 158 MB/s downlink and 114 MB/s uplink. This can already be a problem when your project requires a stable and fast network. Please note that this is the same provider as in the first picture. The server configuration is also the same.

Provider A (server in Germany). This server has quite severe network problems. The minimum network speeds reach 158 MB/s downlink and 114 MB/s uplink. This can already be a problem when your project requires a stable and fast network. Please note that this is the same provider as in the first picture. The server configuration is also the same.

Provider B (server in the USA). This server is even worse.

Provider B (server in the USA). This server is even worse.

Conclusion

These cloud providers do not suit my purposes. I'm already looking for servers at affordable prices with guaranteed bandwidth. If you need a fast and stable network, choose your cloud provider more responsibly. Unfortunately, in my case, GCP and Amazon servers are not suitable for me due to high traffic prices. Therefore, we have to look for other solutions.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *