Using Seldon Core for Machine Learning

The widespread use of machine learning has helped to spur innovations that are increasingly difficult to predict and create intelligent experiences for business products and services. To solve this problem, it is important to apply advanced methods. Sergey Desyak, Leading Expert at the DevOps Competence Center at Neoflex, shares his experience of using Seldon Core for machine learning, in particular, for “rolling out” models.

What is ML

Machine learning (ML) is the use of mathematical models of data that help a computer learn without direct instructions. It is considered a form of artificial intelligence (AI). Machine learning uses algorithms to identify patterns in data and build a predictive model based on them. The more data processed and used by such a model, the more accurate the results of its work become. This is very similar to how a person hones skills in practice.

All this is called the life cycle of the model. First, they pose a technical challenge for data scientists who develop ML models. Subsequently, this model is trained on archived (accumulated) data so that it can work with new data in the future. The desired quality of the model is achieved depending on the task and, if necessary, “tweak” the tuning parameters until the result is achieved.

This adaptive nature makes machine learning great for scenarios where query data and properties are constantly changing and it is virtually impossible to write code for a solution.

Why ML DevOps

Although machine learning can be found everywhere, it creates certain difficulties in implementation. One of them is the need to quickly and reliably move from the experimental phase to the production phase, where trained models can start working quickly to bring value to the business.

The ML industry offers many tools to help solve this problem. Public cloud providers have their own managed solutions for serving machine learning models. At the same time, there are many open source projects. Some of them are free (open source), some are paid.

Data scientists and MLOps work with all this.

MLOps is a junction of technologies such as DevOps, Machine Learning, Data Engineering. Some create ML models, others implement them into production.

How models were created and used before

Initially, data scientists developed on their local computers. The order of their actions was as follows:

  1. Created a model;

  2. Trained and selected the parameters necessary for the launch;

  3. Saved as a pkl file.

After that, the model is ready, but it also needs to be “rolled out”. For this, additional steps were taken:

  1. On Flask, they wrote a “strapping” to run the model in the form of a REST API service (again, hand-to-hand);

  2. Collected the image;

  3. Then, based on the image, entities were created in Kubernetes in any convenient way (pod, deployment, replicaset, services and etc).

Both data scientists and DevOps engineers were necessarily involved in this process, because changes were constantly made to both the model code and launch parameters. In particular, when changing the model, it was necessary to rewrite the code for the REST API of the service, rebuild the image and, possibly, install new packages, involving data scientists for consultation.

How did you optimize the process?

Subsequently, they began to use one of the additional tools for running models – MLflow. It has a graphical interface and allows you to use the UI to observe how the model works and with what result. In addition, MLflow shows previous runs and experiments. With it, the procedure has changed:

  1. Data scientist creates model (in Jupyter-Hub);

  2. Trains the model by selecting parameters in MLflow;

  3. An image of the working model is collected based on the path to the trained model and MLflow in service mode;

  4. Using this image, DevOps creates manifests to run on Kubernetes.

But later there were still problems: the model was only in MLflow format and the language was still only Python. In addition, it was impossible to perform additional actions with the data received as input to the model. If the model changed, then you had to rebuild the image and do everything again.

There were other factors that did not suit:

• Limited in the formats of the models themselves;

• The inability to build a pipeline (conveyor) from several sequential models, that is, the output data of one model should be fed to the input of another, as well as the impossibility of carrying out preliminary transformations of the incoming data;

• Lack of control over the operation of the model, as well as reaction to the presence of failures;

• Inability to conduct A / B tests.

Companies began to look for a more modern solution, because using only MLflow is inconvenient, difficult, and costly in terms of man-hours.

KFServing vs. Seldon core

The choice on the market was from two fairly similar products:


Uses Kubernetes CRD to create a service from models. The main possibilities are:

● Support for models of various types (Tensorflow, XGBoost, ScikitLearn, PyTorch, ONNX);

● Availability of autoscaling, including for the GPU;

● Checking the performance of the server model and configuration at startup;

● Scale to Zero, that is, the ability to practically stop work, waiting for input data;

● Canary Rollouts for deployed services.

Seldon Core

Seldon Core is similar to KFServing. It has the same features, but with additional “features” and support for a slightly larger number of model types. In addition, it is possible to deploy a finished model in the form of a REST API from a regular script (programs in Python, Java, NodeJS) and work with it. No need to make a bunch of intermediate solutions, just take the Python code and run it as a REST API service. Seldon prepares all this for itself in the required format, so no additional tricks are needed.

In addition, it allows you to do A / B tests, canary roll-outs and has Outlier Detector (outlier detector). This detector checks the incoming data for similarity with those on which the model was trained. If the model was trained, for example, on data on temperature in the Arctic, then this temperature was never more than +5 degrees. Therefore, if +34 suddenly falls out, the model will give an incorrect result, the Outlier Detector will “catch” this and report that something went wrong. This is convenient, for example, for scoring in banks: when the criteria for issuing a loan are evaluated, the model will, if necessary, report that the input data is incorrect and will not approve the loan for everyone.

Thanks to the presence Language Wrapper Seldon Core allows you to build a model from different programming languages.

It also has more frequent git commits, meaning it is updated more often and has slightly better documentation than KFServing. Although, to be honest, it’s not perfect. Often you are looking for something on the page in the docks, you may get 404 :). This is how the documentation works. But at the same time, Seldom Core has everything in github. The site just can’t keep up with it.

How to use Seldom Core for machine learning?

In general, we will continue to use MLflow for experiment logging.

All data scientists know the Jupyter Hub product, since they practically only work in it. This is where the creation of models, their launch and training take place. To do this, the necessary launch parameters are selected and the models are debugged to obtain the required quality of work. Each run is monitored in MLflow. In the future, you can see the logs and with what parameters the model started better. This is necessary in order to choose the best metric based on the results. To do this, the MLflow library is simply imported and the model is exported there.

Next is a graphical interface and the data scientist can see which launch suits him best.

Suppose he achieved his goal – the model worked as it should. When the required results are achieved, he does a git push, the model is saved and sent to GitLab to build the image of the future model container. Here the utility from Seldon s2i (Source-to-Image) is used, which from the code in the language (Python, Java, etc.) creates a working image in the desired format, ready for use in Seldon Core. The image can be run and send test data (test stage) to the input to check the success of the build.

As a result, we get a ready-made image for Seldon: that is, not just an image for some system, but an image in the format that Seldon needs. Endpoints are already configured in it, input data is processed and the result is returned. Then all this is “pushed” into turnips for rolling out to Kuber (the manifest is deployed).

In case of a successful build, the model is “rolled out” to Kubernetes with the necessary parameters using Helm chart (using Seldon Deployment) and ArgoCD. So, on the basis of one single manifest, all the entities necessary for the model to work as a REST API service are rolled out. Seldon is powered by Custom resources definition (CRD) in Kubernetes. He sees the type of deployment (Seldon Deployment) and deploys the necessary services, pods – that is, everything that is needed to make the model work.

At build time, you only need to specify the name of the model. The model type (router, classifier, splitter) and the list of packages it uses are also specified. After that, the only file with a list of libraries that was used by the data scientist himself is “thrown up”. For DevOps, it doesn’t matter what’s inside. The data scientist adds a new lib, specifies in the file and the model will be assembled. A DevOps engineer sets up a pipeline once and gives it to a data scientist. That “push” into the turnip, all this is rolled out using, say, ArgoCD on Kubernetes, and you can already send data to the “instance”, which is available outside on Kubernetes. At the same time, the DevOps engineer is free, and the data scientist is not distracted from his work. It is quite convenient, fast and saves a lot of resources.


Let’s summarize the advantages of using Seldom Core for machine learning:

● Most importantly, the CI/CD process for DevOps has become much easier;

● The Kubernetes native solution, at the expense of the operator, automatically reduces the number of steps during the “deploy” of the model. There is no need to attract a large number of employees;

● Greater flexibility to use different types of models and in different combinations. The ability to create model pipelines without writing a lot of code;

● Integration with modern solutions: Istio, Prometheus;

● Logging and management out of the box.

Additionally, it is possible in Prometheus to monitor the load and the result of the models. If we roll out A / B tests, we can see the parameters of each model’s processing on the input data.

This is a fairly flexible solution, because data scientists can work in different languages. Mostly Python, but sometimes NodeJS or Java. An image with a working model will be assembled regardless of the language in which it is written.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *