Ways to Reduce Costs When Working with Kubernetes

Kubernetes is the most popular orchestrator in Russia. According to VK Cloud research “State of Kubernetes”K8s is used by 56% of companies working with orchestrators (and 53% of them work with Kubernetes in the cloud). Moreover, the demand for working with the technology is increasing. At the same time, many companies face a lack of experience, which leads not only to technical difficulties, but also to unjustified expenses.

We explain why working with Kubernetes can be a “heavy burden” on a company’s budget and how to avoid it.

Life Before Kubernetes: Let's Start Over

Kubernetes (K8s) is designed to work with a large number of hosts and containers. Accordingly, K8s is more suitable for applications with a loosely coupled microservice architecture and clear modularity boundaries that can be independently deployed, developed, and scaled.

Everything is obvious here: it is more convenient to pack microservices into separate containers for subsequent management via k8s. In the case of a monolithic architecture, it is impossible to correctly split application components into containers.

Two conclusions can be drawn from this.

  • If the application does not exist yet, it is better to make it a microservice right away during development. This way, you can ensure full compatibility of your application with Kubernetes out of the box without the need for any hacks or global reworking at the architecture level. When developing a microservice application, it is advisable to take the “gold” standard as a basis. The Twelve-Factor App — it contains the main recommendations regarding the methodology of code development, dependencies, working with external services, administration and other issues that you will have to face. Particular attention should be paid to recommendations that help to avoid overpaying from the start. For example, assemble containers based on the principle of minimum necessity — this will help save on the registry; use cloud PaaS services — this will save on administration and development time; conduct load tests — this will make it possible to use resources justifiably.

  • If the application exists, but has a monolithic architecture, it will have to be broken down into microservices. The result should be a loosely coupled architecture, where each functionality will consist of separate services and will be containerized along with dependencies. Obviously, in this case, the changes will affect not only the architecture design, but also the application code, that is, this is a long and resource-intensive path. A small life hack will help here: it is not necessary to deploy the entire application in Kubernetes at once – you can deploy some of the services in K8s, and leave the monolithic part outside the orchestrator.

Now let's take a closer look at the situations in which Kubernetes can become an “expensive pleasure” and how to avoid this.

All cases and recommendations are given in the context of working with Kubernetes in the cloud — 53% of companiesworking with the orchestrator, choose this environment. But they are also applicable when working with K8s on other platforms.

Causes of “money leakage” at the architectural level and ways to eliminate them

Often, the reasons for irrational resource consumption and, as a result, additional costs are due to ignorance or misunderstanding of the built-in K8s mechanisms. Let's consider typical problems of this “level” and ways to solve them.

No autoscaling

Applications often operate not with a constant, but with a dynamic load. At the same time, in order to stably withstand peak loads, developers sometimes reserve and keep large arrays of capacities active. This approach is wrong, because even idle but active capacities must be paid for (in the case of working on your own hardware, you also have to spend money on purchasing equipment). That is, working with a fixed array of pods is uneconomical.

The best practice in this case is to use horizontal capacity autoscaling using HorizontalPodAutoscaler (HPA).

HPA is an autoscaling feature in K8s that automatically increases or decreases the number of deployed (active) pods (containers) based on the current load. HPA can automate both CPU and memory management.

Thus, using HPA eliminates the need to keep unnecessary capacities active and waste budget on them: idle pods will be automatically disabled, and new ones will be added only when they are really needed.

Using a constant number of worker groups

One of the most “voracious” consumers of resources in K8s is workgroups. Accordingly, as in the case of pods, keeping them on without loads is a direct path to a quick and inefficient budget spending.

In this situation, workgroup autoscaling, which works in tandem with HPA, also comes to the rescue.

The logic of how such a “link” works is simple and obvious:

  • As load decreases, HPA automatically reduces the number of containers in the application's workflow;

  • If the load continues to decrease, the number of nodes in the node group decreases.

It is important to note here that for the autoscaling to work correctly, it is important to limit the maximum number of scalable nodes. If this is not done, in the event of a memory leak or DDoS attack, the capacity will be added erroneously, increasing costs.

You can read more about how to configure cluster node scaling here Here.

Use of dedicated capacities only

When working with K8s, temporary or even one-time tasks may often arise. The application may also use stateless services that do not need to save states. Without the necessary experience or with large budgets, dedicated capacities, fully managed by the cloud client, may be used for such loads.

But this is not always justified: for temporary or non-critical loads, it is more rational to use Spot instances, that is, VM instances that are not currently used by the cloud provider.

The main advantage of this approach is that discounts on Spot instances can reach 90%. But there is a downside: the reliability of such instances is not guaranteed, because the provider can turn them off at any time if it needs this capacity.

However, if Spot instances are managed correctly, they can be used effectively in different scenarios, significantly reducing costs.

Incorrect handling of data

The application's operation is inextricably linked with the need to process and store large volumes of multi-format data. If you leave all files in the orchestrator's loop, K8s risks turning into a “slow” and expensive storage. From the point of view of economy, this is clearly not the best scenario.

The best practice when working with Kubernetes is to store large amounts of data in S3 storage – it is slower, but much cheaper and more capacious.

What is especially important is that it is quite easy to build a bundle of Kubernetes and S3 storage in the cloud. For example, in the VK Cloud, you don’t even need additional connectors and complex implementations for this — everything can be set up in a few clicks.

About savings at the “cube” level

We've sorted out budget leaks at the application architecture level, and now let's move on to Kubernetes itself. Everything is a bit simpler here, since K8s has built-in optimization mechanisms – you just need to understand them and start using them.

Setting limits

Kubernetes has a mechanism of limits and requests — with their help, you can assign how many resources each container can use. A limit is the maximum value that cannot be exceeded. Requests are the smallest amount of resources that should be available to a container.

In the context of reducing costs, it is the value of the limits that is important – if it is not set, in the event of a memory “leak” the container will grow unjustifiably and occupy all available resources.

In fact, setting limits makes it possible to regulate capacity consumption and prevent expenses on paying for cloud capacity that is not actually needed.

One of the key challenges here is to set the limits correctly. If you set them too low, the application will simply crash during peak loads or will not work efficiently enough. Therefore, to avoid making a mistake at this stage, it is better to use special applications for calculating limits, such as Robusta or Katalyst.

Assigning resource quotas

Working with quotas is somewhat similar to working with limits – here you can also control resource usage, but for each namespace. That is, with the help of quotas you can limit how many resources a user or group of users can use in a K8s cluster.

Since there is a clear “volume of resources consumed = money” relationship, the consequences of ignoring this practice are obvious: you can end up with unjustified overspending of capacities at the namespace level and go far beyond the expected budgets.

Let's look at the algorithm for working with quotas using a simple example.

  • Create a namespace using the command kubectl create namespace <namespace-name>.

  • Create a resource quota using the command kubectl create quota <resource-quota-name> --hard=<cpu-limit>cpu,<memory-limit>memory --namespace=<namespace-name>.

  • Assigning quotas using the command kubectl create quota my-quota --hard=3cpu,3Gi --namespace=my-namespace. There is a quota with a limit of 3 CPUs and 3Gi memory.

  • Create a limit range using the command kubectl create -f <limit-range-definition-file> --namespace=<namespace-name>.

  • We apply the quota and limit range by adding the annotation to the module's YAML quota.openshift.io/my-quota: "true" limits.cpu: "3" limits.memory: "3Gi".

The set quotas can be viewed using the command kubectl describe quota --namespace=<namespace-name>.

Determining the size of nodes

The correct calculation of the node size directly affects the efficiency of budget spending. Everything is simple here: the node size determines how many pods can be placed on it. If the pods consume, for example, only 50% of the node's resources, then the remaining 50% will be idle, but you will have to pay for them. Therefore, from the point of view of savings, it is important that the node size takes into account the potential load.

The calculation principle and detailed recommendations are usually offered by cloud providers – for example, this is described in detail in VK Cloud technical documentationBut the simplified approach is as follows:

  • if the application requires a lot of resources, but it is not always actively used, it is better to choose large nodes – in this case there will be sufficient power reserve for peak loads;

  • If the application requires few resources, but it works constantly, then smaller nodes are more suitable – in this case, resources will be utilized more efficiently.

Disabling and removing unnecessary

When working with K8s, there are often scenarios where capacities are needed temporarily — for example, for tests or small temporary tasks. The nuance is that often even for such cases, pods and clusters are allocated that remain active after use. Then everything is obvious: the test is run, the cluster remains active, the money for the allocated resources is written off, the budget ends.

It is easy to prevent such scenarios – it is enough to introduce the practice of stopping clusters and pods during a long downtime. The savings will be significant – after stopping, only disks will be charged, and not all VM capacities.

In this case, of course, you need to first check that disabling a cluster or pod will not affect the dev, stage, and prod environments.

The situation with “garbage” on clusters and pods is similar. Cluster and pod resources are limited, and using them to host unnecessary test applications or old metadata is not the best practice, since the budget will be spent on storing “garbage”. Moreover, the problem can really be significant: the gradual accumulation of “garbage” that consumes even a small amount of resources can eventually lead to a noticeable overuse of capacity and irrational scaling of clusters.

Thus, regularly cleaning clusters and pods from “garbage” is a great opportunity to reduce costs. Moreover, for these tasks, you can connect automatic functions such as automatic removal of metadata.

Important life hack

It is often impossible to see the economic effect of the optimization immediately – it takes days, weeks or even more. But such “blind” movement is not always justified – sometimes you can give up the necessary opportunities for the sake of saving “a penny”.

To avoid such screw-ups and get a full picture of the impact of actions on the economic effect, you can use special monitoring and observation tools, such as OpenCost.

OpenCost is an open-source service for monitoring expenses in Kubernetes and cloud environments. It can be used to track past and current expenses on K8s and the cloud, as well as resource allocation. Moreover, the tool provides a detailed “breakdown” of expenses by clusters, nodes, namespaces, controllers, pods, services.

When working with such solutions, the “blind spots” of the optimization being carried out disappear – you can immediately see which processes and how they consume resources and how much costs are reduced as a result of the manipulations carried out at the architecture and Kubernetes level.

Instead of conclusions

It’s easier than you think to channel your Kubernetes expenses in the right direction if you know where to save. You can learn more about this in our webinar “How to work with Kubernetes without spending too much?”where we talk about the “gold standard” The Twelve-Factor App, monitoring and autoscaling. All this is supported by examples of working with K8s in VK Cloud – there you can immediately apply everything in practice.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *