Machine learning is used in various fields: from business analytics to astrophysics. For proper resource consumption, models are deployed in containers on dedicated servers or in the clouds. Now you can work effectively with ML in ready-made Kubernetes clusters – they have high-performance video cards.
Advantages of GPU over CPU
A GPU is needed for 3D, rendering, and more. Computations that are better handled by the GPU are found in complex analytics and machine learning.
Most of the calculations when training ML models are matrix ones. Suitable to work with Tensor and CUDA – special graphics cores integrated into the GPU. This gives it advantages over the CPU in machine learning.
Let’s single out the most frequent tasks where the GPU is more efficient.
Training simple models
Imagine that you want to train a model to predict house prices in Boston. It should take into account the level of crime, the number of parks in the area, landscaping, and so on. In order for the model to learn to predict, it needs to “feed” history – a large set of data from various factors and observations about changes in house prices.
The GPU can handle history processing faster, since each dataset is a matrix of factors and observations. The more of them, the more noticeable the difference in the learning rate on the GPU and CPU.
GPU vs CPU, comparison of training time on one sample. Source
Deep Learning: Working with Media Data
GPUs are especially needed for working with media data: images, video, speech – they can be represented on a computer using tensors. They are encountered in problems of classification and image processing, computer vision, and in generative adversarial networks.
Let’s say you need to train a neural network to generate photos of non-existent cat breeds. To do this, it needs to be trained on a large dataset of images. This is a complex computational process. In order for the network to work with images, they are converted into three-dimensional matrices (tensors) from cells with pixel parameters.
Matrix calculations are used to work with tensors – affine transformations, displacements, rotations, and so on. The GPU handles them more efficiently. In addition, there are a large number of neural network architectures that are implemented based on operations with tensors.
Comparison of performance of image classification on GPU and CPU. The EfficientNet-B2 model was used for classification. Source
Performance of ML models in production
GPUs are needed not only at the training stage, but also when working in production. If your ML service works with media data, then with a video card, the speed of user service will be higher. In addition, models often
while working with client requests.
Let’s say a company is developing an application that determines the name of a dish from a photo. When the service was first launched, the model was often wrong. Users helped her classify dishes. Thousands of people could use the service at the same time. However, the application did not slow down, as it worked on the power of the GPU.
OpenCV (CUDA core), efficiency of models on CPU and GPU. Source
Is it possible to train on the CPU?
Not in every ML task there is a noticeable difference in speed between CPU and GPU. For models on small samples of numerical data (for example, for simple forecasting or regression analysis), it does not matter on which processor to work.
The quality of training does not depend on the selected type of processor. With a balanced representative sample, the quality of the model will not change. But the learning process will be slower. It can take months – a time that companies do not always have in a competitive environment.
Why containers are used for ML tasks
Trends show that to run ML models, it is better to use containers that can be deployed both in the clouds and on dedicated servers. There is a Kubernetes orchestrator for managing containers. From the moment of its appearance, it has been “overgrown” with the software necessary for ML.
“K8s tools allow ML professionals to use the speed of the GPU in containers”— Ajit Raina, Senior Development Manager Redis Labs.
Let’s highlight the main advantages of Kubernetes, which are important for deploying ML services.
Separate working environment
Each ML model must have an environment – modules, libraries, etc. And the versions of the libraries depend on the versions of the drivers on the node. In Kubernetes, you can run models on nodes with pre-installed driver versions. Moreover, do it automatically.
Easy container administration
Container management is more convenient and simpler than in the same Docker Swarm. Moreover, containers are autonomous. K8s itself manages resources, depending on the “needs of the application”.
“Kubernetes makes containers and applications easy, fast to organize and manage. The technology allows you to automate operational tasks – managing the availability of applications and scaling them”— Thomas DiGiacomo, Director of Technology and Products SUSE.
Availability of tools for ML experiments
Kubeflow, a machine learning platform that runs ML pipelines on Kubernetes clusters, is responsible for training, comparing, and selecting optimal models. Kubeflow allows you to “isolate” the conduct of various experiments by creating podsas well as automate the selection of reference models.
This is important, since in practice it is not known which approach and which mathematical model to choose for solving the problem. This is determined by the results of experiments.
Several ML engineers and mathematicians can test their hypotheses and results on the same dataset, select the best models and not be afraid of “conflicts” between experiments in the namespace. But this is just one of the benefits.
Kubernetes can automatically adjust the number of resources used based on workloads. Autoscaling is done in conjunction with application scaling (pod level) and adjusting the number of nodes within a cluster (cluster level).
Also in K8s, the network policy settings system and the use of namespaces are being actively developed. In recent versions, the orchestrator has consolidated protection at all levels. We talked about this in more detail in our article.
How Managed Kubernetes Helps Machine Learning
Deploying services is very complicated due to the specifics of Kubernetes. To help administrators work with containers, there is
, which allows you to automate basic application support tasks. In addition, the processing of requests for clusters is reduced to several minutes.
Consider the properties of ready-made Kubernetes clusters, which also facilitate the deployment of ML services.
In Managed Kubernetes, you can “back up” application replicas. Before deploying the application, it is enough to indicate the number of running replicas that will work “on the hook”.
This is useful for highly loaded ML services. If there is an unexpected influx of incoming requests from clients, the service will scale under load – automatically start new replicas to avoid downtime.
Wide choice of graphics cards
Everything is available in Managed Kubernetes Cloud Platform GPU. You can create preconfigured nodes with NVIDIA Tesla T4, A2, A30, A100, A2000 and A5000 graphics cards. Of these, A2, A30 and A100 are accelerators designed for AI tasks. If you need to train a large model, the best option is A100. For the “smallest” models, T4 is suitable.
Through choice, you can save money and “reconfigure” the cluster on occasion. For example, during model training, use a group of nodes with a powerful video card, and during inference, use nodes with a weaker video card.
Relative productivity of accelerators. Source
Each of the video cards has a certain number of tensor and CUDA cores. The more of them, the faster high-performance computing, model training and work in inference. On the diagram you can see the characteristics of each of the video cards.
Graphics card specifications
Autoscaling for groups of GPU nodes is currently not available as it is a rare request today. Even relatively weak video cards can withstand heavy loads.
Who Might Need a GPU Cluster
When creating ML services, it takes a lot of time to collect and mark up data.
In addition, the model needs to be trained and rolled out to production.
Example − History of creation voice assistant “Oleg” from Tinkoff. To take into account all the text requests of customers, developers collect big data. The more data, the more accurately the language model determines the meaning of what was said.
It is difficult to say how much such a model should be trained before the release of the “first version”. The process can take hundreds of hours on the GPU. And on the CPU and at all – thousands. You also need to periodically evaluate the results and conduct further training of the model if necessary.
Some companies can afford large time costs, and some cannot. Therefore, the release of the GPU in Managed Kubernetes is especially relevant for companies that value development speed: instead of wasting time and money on expensive processor power in ML, it is better to use the GPU.
Starting a cluster with a video card
In Managed Kubernetes, you can create a cluster with the GPU in a few steps.
- Register in the Selectel control panel and create a cluster in the desired region.
- Choose one of the fixed GPU node configurations.
After creating a cluster, you will have access to nodes with installed drivers for the correct operation of the GPU.
If you are interested in Managed Kubernetes, read:
→ Docker Swarm VS Kubernetes – how businesses choose orchestrators
→ Sitting on two clouds: comparing ways to organize a multi-cloud infrastructure
→ Deploy is lava! How Managed Kubernetes helps businesses put out fires