What is MLOps and how we implemented model cascades

Hello, my name is Alexander Egorov, I am an MLOps engineer. In this article I’ll talk about how we roll out a huge number of models at the bank. We will analyze not only the pipeline for laying out individual models, but also entire cascades.

How does the need for MLOps arise?

Gradually, with the growth of the topic of artificial intelligence, everyone began to quickly develop their own models. However, as with the development, for example, of the same websites, data scientists, ML engineers, and other specialists were faced with the problem of how quickly they could reproduce the entire process again. Moreover, the process may not always be successful. According to statistics, only about 25% of models actually make it to production. Therefore, you will most likely have to repeat the entire cycle for the model several times before it really begins to benefit the business. But all systems have entropy, and after some time, even a previously fully working model needs to be retrained, as it may lose its accuracy.

A simple conclusion follows from this: just like website development, the development of machine learning models must have its own CI/CD processes. MLOps can help us with this. The MLOps approach involves the introduction of DevOps principles into the field of machine learning.

However, one way or another, the field of machine learning has its own aspects. In this situation, we have to work with big data. Often, raw data is not ready for models to be trained on, and for this, developers have created ETL processes that allow them to receive data, process it and send the results further in order to train models on it. However, the ETL process does not imply the training of the model itself, much less its rollout to the stand.

The MLOps approach involves creating a pipeline that will allow you to train a model to the required accuracy on ready-made data, as well as roll it out into a test or production environment.

MLOps Toolkit

When we move from regular DevOps to MLOps, we have new tools that cover various new problems. And so now I will briefly introduce the tools that are already used in our pipeline at Alfa-Bank.

I tried to arrange all these tools in the order of their use during the development of models: data preparation, training of models and their layout.

And the first tool on the list is Airflow — a tool for developing, planning and monitoring complex processes. For a general understanding: it is CronTab on steroids, capable of executing entire pipelines written in Python at a specific time. For the most part, writing ETL processes involves using this tool.

Hadoop is already a fairly old ecosystem for storing big data and processing it. It includes many different components, however, one of the youngest is the Spark framework, which allows you to implement distributed computing over certain data sets.

Within the Kubernetes cluster on which all our tools are located, this tool is deployed via SparkOperator. Using this framework with AirFlow, as well as storing data in a Hadoop cluster, gives a big increase in processing speed and storage efficiency.

Next, after data processing, models are developed, and the tool can help with this MLFlow, which allows you to save and reproduce machine learning models. You create an MLFlow experiment, train a model within it, and all the results are automatically recorded in MLFlow along with all the necessary metrics, graphs and charts. The MLFlow project is constantly evolving and has a huge community, which certainly gives advantages in terms of using it in various environments and, indeed, in pipelines.

Once we have a trained model ready, we can move on to the inference step. What can it help with? Seldon Core, which is a whole platform for deploying ML models. Now this is one of the most popular solutions on the market, which will wrap your trained model from MLFlow in an online version, providing an API for communicating with it, and also launch it as a service. For manual testing you will get Swagger UI.

On our platform, Seldon Core, just like Spark, is launched in the form Kubernetes Operator.

Additional module

Typically, all DevOps pipelines are built on the basis of some kind of CI/CD tool. The image is assembled and deployed, for example, to a Kubernetes cluster. However, in the context of ML, the use of more specific tools such as AirFlow, Spark and Seldon Core is required.

To integrate Spark and Seldon with Kubernetes, we use special operators that control the launch of SparkApplication and SeldonDeployment within the Pod. In turn, AirFlow works with DAGs – acyclic graphs that are described in Python. Each of these technologies has its own way to run it: either separate Kubernetes manifests or Python scripts.

So, to train and run a variety of models with different parameters within all these tools, we have developed a special module in Python, which, based on the input configuration from the developer, allows him to train and publish models without deep knowledge of the infrastructure. This module acts as a bridge that connects DS's knowledge of the model with the infrastructure.

To begin, DS creates a simple configuration file in its repository to work with our module. This file specifies all kinds of information for training or laying out the model: type (batch or online), time and interval for running DAGs, parameters for the Spark session, version of Python on which the model should be run, its name and all other information that can be specified DS about his model.

Thus, the main task of the module is to take a user configuration, create a Spark Application, DAG based on it and place it all in AirFlow Scheduler, if we are talking about Batch models. If we are talking about laying out online models, then the module will create a SeldonDeployment manifest.

Let's take a closer look at the pipeline steps

First, let's look at the training pipeline.

The process begins with standard code upload and testing. After various code checks, the model environment is assembled (i.e., all libraries are installed), and the archive with it is saved in S3 storage. Next, the module creates the necessary manifests based on the configuration and transfers them to AirFlow Scheduler.

Now that we have a ready-made environment for the model to run, a DAG for AirFlow and a manifest for running SparkApplication in a cluster, we can start training.

Since all our models work with Spark, there are two main actions performed within the DAG: Spark Submit and Spark Wait.

  • The first task launches a Spark session: on the Kubernetes side, an entity called Spark Application is created, the manifest for which has already been passed to the Scheduler at the previous stage. Next, this Spark Application creates a Pod, in which the model training begins.

  • The second Spark Wait task simply monitors the state of the Spark Application after it has been created.

Once the training is complete, the Pod Spark Application exits. The DAG in Airflow is marked as completed successfully, thus terminating the entire pipeline in Jenkins.

For inference, the batch model pipeline has a similar structure, but instead of training in a Spark session, the finished model is downloaded from MLFlow and deployed if it is a batch model.

If our model is online, then in the module, instead of DAG and Spark Application, the SeldonDeployment manifest is created.

Now that we have a full-fledged pipeline for training and subsequent production, the time it takes to bring models into production has also significantly reduced.

Previously, the cycle to launch a model could take months, which was a very lengthy process. Now, for a similar model, the training process and subsequent release to production with all kinds of tests takes about a couple of days.

The maximum time according to the regulations cannot exceed 5 days for Batch models and 10 days for Online. The actual operating time of the inference pipeline in the development environment can range from 7 minutes to 25. All this is highly dependent on the type of models.

  • If fairly lightweight libraries like SkLearn are used in its environment, then this is a lower threshold of 5 minutes.

  • If the model uses GPU, built on the basis of TensorFlow, then this is the upper limit, since only the environment itself can reach about 15 GB.

Cascades: why they are needed and how they work

So, the speed of work has increased, and therefore the number of created models has increased. And with this came a new problem, namely, how to launch them all.

The first case that happened to us was that about 100 models appeared. And they all needed to be launched. However, with the current pipeline implementation at that time, this process would have taken an enormous amount of time. We would simply have to call the pipeline with the necessary models 100 times.

In another case, we have a model that uses data from the answers of several others. Essentially, we needed to deploy several of the same models at the beginning and then launch another, thus creating an ensemble.

Based on these cases, it was necessary to come up with some mechanism that was supposed to help us solve such cases. This is how we came up with the idea of ​​creating cascades of models.

A model cascade means running several Batch-trained models within one DAG.

Its configuration is also done in the config file by a data scientist using some blocks. Each block contains models that run in parallel, while the blocks themselves are executed sequentially.

Now let’s look in more detail at what’s going on inside them.

Each block is performed in three stages.

  • At the beginning, artifacts are downloaded for all models. These are datasets that will then be fed to them.

  • After completing this step, all models that were specified in the block are launched directly. They run, receive data from the previous step and issue their response.

  • Next, the final step is to export predictions to the Hadoop cluster.

After completing all the steps, the block is marked complete, and a similar process is carried out with the remaining blocks.

However, I will point out an interesting feature.

The main goal of the pipeline and the module as a whole is to create a DAG. And either AirFlow itself or the user will launch it. That is, the DAG is either executed according to a schedule or triggered by the DS in the future.

And at this stage, the user can also indicate which models or entire blocks need to be worked on and which not. This is done through a parameterized DAG launch.

The mechanism for skipping certain models is carried out using technical tasks. The fact is that loading artifacts, raising the model and uploading results are not just individual tasks, but groups of tasks. Each of them contains 3 technical tasks. You are already familiar with the last two – Spark Submit and Spark Wait, which respectively launch a Spark session and wait for it to complete.

Here, in front of them, there is another task called Select, which is responsible for the logic of comparing the input parameters of the DAG with its original representation in the config file from DS, on the basis of which the DAG was created.

If some model or some entire block is missing from the parameters, then the session will not start. This reduces runtime during tests, saves cluster resources and, in general, makes the process of working with cascades more flexible.

Results

So now the use of cascades has allowed us to run many models both in parallel and sequentially. During DAG runs, we can choose what we need to execute and what not. This greatly simplifies testing of individual models after production.

And, to summarize, the model cascades themselves free users from the routine launch of multiple pipelines, giving them a new opportunity to build entire ensembles of models, describing them in a very simple form in the config in their repository.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *