Seldon in MLops-infrastructure beeline business

A few words about MLops

Not a single article about MLops was complete without the illustration below, and we will not break the tradition.

So, the term MLops is commonly understood as a set of practices at the intersection of Machine Learning, DevOps and Data Engineer. The purpose of introducing MLops processes is to automate various stages of development, testing, implementation and monitoring of machine learning models. Often the diagram of the stages of creating ML solutions looks like this:

To create an ML solution, traditionally, you need to go through the following steps, according to the figure above:

  • data preparation;

  • data analysis and feature building;

  • model building, quality validation;

  • implementation of the model in a productive environment.

Then it is necessary to maintain the model in production: maintain the service with the model, monitor the quality of model predictions, retrain or retrain the model, alert in case of a decrease in target metrics. Automation of these processes is part of the tasks of MLops.

Implementing MLops practices makes sense when the effort involved in implementing them is outweighed by the benefits of using them. For example, for a company with a small share of projects using machine learning models, the value of implementing MLops is minimal.

On the contrary, in companies with a large zoo of models in various projects, the introduction of MLops practices is a way to be competitive in the market. Properly organized MLops processes can significantly reduce the time-to-market of model implementation, monitor, and reduce costs for other roles in the company (for example, devops engineers or testers).

The level of penetration of MLops into the company’s processes may vary. There are several generally accepted approaches to determining the level of “maturity” of MLops practices in a company. Google has proposed a compact MLops maturity model.

The model consists of three levels, each characterized by a specific set of tools and automated processes:

  • MLops level 0: Manual process;

  • MLops level 1: ML pipeline automation:

  • MLops level 2: CI/CD pipeline automation.

We will not dwell on the description of the levels, everything is in detail outlined concept authors. Note that we have not yet been able to see whether the company’s MLops processes fall into the exact description of one of the levels. Rather, these levels should be taken as conditional benchmarks.

A more detailed alternative is the model from GigaOm. The authors propose to evaluate maturity according to five criteria:

  1. strategy;

  2. architecture;

  3. modeling;

  4. processes;

  5. control.

Each criterion describes five levels of maturity in sufficient detail. You can get acquainted Here. The model from GigaOm is more flexible compared to the model from Google, it is more convenient to use it for planning the development and implementation of MLops practices in the company. There is also a model from Microsoft and other IT giants.

A brief diagnosis for MLops processes in beeline business is as follows: according to the criteria from Google, we are between MLops level 1 and MLops level 2, confidently approaching the automation of most processes for creating and deploying ml solutions.

What is Seldon Core and analogues

Let’s discuss what Seldon Core is and explain our choice of this particular technology among competitors

Seldon Core

Seldon is a DataSience ecosystem that provides ergonomic tools for the effective implementation of ML projects. One of the open source elements of this ecosystem is the Seldon Core module. Seldon Core is an open source platform for easy and fast deployment of models and experiments with the ability to scale services in a Kubernetes environment. Essentially, the Seldon Core is operator Kubernetes.

Here’s who and why it can be useful.

Data scientists love to train models, but not all of them like to write wrapper services that call models. The path from the birth of a trained model to a service with an interface to access it can take quite a long time. The model can be in the hands of developers who will write the service, devops engineers who will deploy the service, and testers who will test it. Looks long and complicated.

Tools like Seldon Core simplify the path from a trained model to a service interface and make it a single step.

In Seldon Core out of the box you can get:

  • autoscaling of services both on the CPU (processors) and on the GPU (video cards);

  • the ability to deploy models of various types (tensorflow, pytorch, ONNX, scikitlearn, XGBoost), including those with custom data preprocessing;

  • testing services with models;

  • snooze, that is, the ability to “extinguish” inactive services and “raise” when a load appears;

  • versioning services with models;

  • various testing scenarios: canary rollouts, A/B tests, shadow deployments;

  • monitoring services with models;

  • monitoring outliers in the data that are fed into the models (Outlier Detector).

There are a number of similar tools: KFServing, MlFlow Serving, BentoML. Quite a lot of material has been written on the issue of their comparison with Seldon Core and among themselves. You can see here, here or here. We settled on Seldon Core because of the largest set of features in the box compared to analogues, the nocode approach, and the ability to relatively easily deploy Seldon Core in Kubernetes.

Let’s look at the DS infrastructure in beeline business and the place of Seldon Core in it.

DS infrastructure

You can learn more about the development of the DS-infrastructure and its internals in beeline from the reports of our colleagues here And here. We will talk about this in detail in one of the following articles.

In short, each DS team is provided with a working environment, which is a namespace in Kubernetes with certain resource quotas (CPU, RAM, GPU). Jupiter Hub is deployed in each production environment. In JupyterHub, you can raise sessions with JupyterLab or VS Code with given resources.

Resources are allocated to the entire work environment based on the needs of the team. In JupyterLab, we train models and experiment on them, we use MlFlow for tracking experiments and storing models, and Ceph storage for data exchange. There are both general Ceph directories available to everyone (for example, with python environments), as well as command and personal ones. For seamless authorization in services, we use the KeyCloak Single-Sign-On solution.

The intended use of most models is to obtain model scores in a cluster on a Hadoop cluster, which are usually presented in the form of large Hive tables. To do this, we ran pySpark scripts with model calculation using the Apache Oozie scheduler on a Hadoop cluster.

What we wanted to fix in this process:

  1. get away from manually transferring the model and scores to analysts to test the correctness of calculations;

  2. get away from manual transfer to model developers in the form of serialized files with data for validating the deployment;

  3. introduce common approaches for testing models for all teams;

  4. in the task of serving models, move from Apache Oozie to ArgoWorkflow.

We closed points 2 and 4 with the introduction of a flow with a seamless deployment of models based on ArgoWorkflow, Gitlab and MlFlow. Let’s illustrate the dependencies of services on each other:

DS experiments from JupyterLab are logged in MlFlow. At the end of work on the model, the necessary information about the model is saved in Gitlab. To create a typed way to store model information in Gitlab, we use the Cookiecutter framework. The repository template looks like this:

To deploy models, we built a pipeline on ArgoWorkflow. Simplified, it looks like this:

At the first step, a job is performed with data preparation using Hive, and dependencies (requirements.txt) for the python environment with the model are loaded from MlFlow. The second step pulls the necessary dependencies from the previously loaded file, then the model is loaded from MfFlow and the pySpark script is executed with the calculation of the model speeds. The result is stored in a Hive table.

We will cover this topic in more detail in one of the following articles, and then we will talk about implemented approaches to model testing.

Testing with Seldon Core

It was mentioned above that Seldon Core provides the ability to easily deploy REST API or gRPC wrapper services for models. In view of the fact that we use MlFlow for logging experiments and storing models, it suited us MlFlow Server for deploying models from the MlFlow registry.

We log models using the module mlflow.pyfunc, which allows you to save, in addition to the model, a script with data preprocessing for it. The Seldon Core service with such a model will first execute the data preprocessing script, then calculate the model scores. In our processes, such a scenario seems to be very convenient, there is no need to additionally store the preprocessing code somewhere and use it before submitting data to the model.

Unit tests

When updating the code in the git repository of the model, we set up unit test executions using CI / CD in Gitlab.

The pipeline in Gitlab looks like this:

At the first step, a script is executed with the preparation of data for the ML model, then the service with the required model is raised and the script for calculating the model scores and comparing them with the standard is executed.

The introduction of unit tests allowed us to free up the resources of analysts who previously had to verify the correctness of the calculation of model scores.

Retro Tests

Several product teams have a scenario of interaction with partners, in which the partner is interested in what quality our models would demonstrate on historical data. In this case, we receive a request from the partner to calculate a certain model for a certain period of time, then we give the partner the scores of the model for quality control. Often there was a situation when the model was developed relatively long ago, and DS engineers had to raise special versions of the environments for the correct use of the ML model.

With the help of Seldon-Core, we moved away from the practice of using special environments and built the following pipeline on ArgoWorkFlow for retro-testing:

The first step is to prepare the data for the model, then we raise the service with the REST API interface to the model, the last step is to calculate the scores of the model by querying the raised service via the REST API, and save the results.

Conclusions and plans

In conclusion, let’s sum up the intermediate result of using Seldon Core in our MLops processes:

  • the introduction of Seldon Core for conducting unit tests made it possible to implement a transparent pipeline on ArgoWorkflow, which led to the exclusion of analysts in the tasks of checking the correctness of using the ml model;

  • the introduction of Seldon Core to conduct retro tests in the ArgoWorkflow pipeline reduced the processing time for client requests.

In the future, we plan to use Seldon Core as an interface to models in all internal and external services, we want to roll out various test scenarios, as well as integrate Seldon Core with Prometheus to monitor the load on services.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *