Deploying a Machine Learning Model with Docker – Part 2

Expanding opportunities for our students. Now OTUS has two Machine Learning courses: base and advanced… Both courses start in August, and therefore we invite you to watch online record of the open day, and also invite you to sign up for free lessons: “Pipeline for work with ML task” and Finding Anomalies in Data

In the first part of this tutorial, we successfully saved your classification model to a local directory and completed all the model development work related to the Jupyter Notebook. From now on, the focus will be on deploying our model. To reuse the model for prediction, you can simply load it and call the method predict()as you usually do in Jupyter Notebook.

In order to test the model, in the same folder as the file model.pkl, create a file with this code:

import pickle
# Импортируем все пакеты, которые необходимы для вашей модели
import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# Загружаем модель в память
with open('./model.pkl', 'rb') as model_pkl:
   knn = pickle.load(model_pkl)

# Неизвестные данные (создаем новое наблюдение для тестирования)
unseen = np.array([[3.2, 1.1, 1.5, 2.1]])
result = knn.predict(unseen)

# Выводим результаты на консоль
print('Predicted result for observation ' + str(unseen) + ' is: ' + str(result))

Reusing the model for forecasting.

You can call the predictor method multiple times on unknown observations without restarting the training process. However, when you run this py file in the terminal, you may encounter an error like this:

Traceback (most recent call last):
 File "", line 4, in 
   from sklearn.neighbors import KNeighborsClassifier
ImportError: No module named sklearn.neighbors

This is because the package we are using is not available in the environment in which you run the file. This means that the environment used to develop the model (conda) is not identical to the runtime (python environment outside of conda) and this can be seen as a potential problem when running our code in other environments. I specifically wanted you to see this error, to help you understand the problem, and to re-emphasize the importance of using containers to deploy our code to avoid such problems. For now, you can simply manually install all the required packages using the “pip install” command. We’ll come back here later to do this automatically.

After installing all the packages and successfully running the file, the model should quickly return the following message:

Predicted result for observation [[3.2 1.1 1.5 2.1]] is: [1]

As you can see here, we use hardcoded unknown data to test the model. These numbers represent the sepal’s length, its width, the length of the petal and its width, respectively. However, since we want to expose our model as a service, it must be exposed as a function that accepts requests containing these four parameters and returns a prediction result. This function can then be used for an API server (backend) or deployed to a serverless runtime such as Google Cloud Functions… In this tutorial, we’ll try to build an API server together and put it in a Docker container.

How does the API work?

Let’s talk about how web applications work today. Most web applications have two main components that cover almost all the functionality an application needs: frontend and backend. The frontend is focused on serving the interface (web page) for the user, while the frontend server often stores HTML, CSS, JS and other static files such as images and sounds. On the other hand, the backend server will handle all the business logic that responds to any requests sent from the frontend.

Web application structure illustration

This is what happens when you open Medium in your browser.

  1. Your browser sends an HTTP request to the address… A number of operations are required on the DNS server, routers, firewalls, etc., but for the sake of simplicity of this article, we will ignore them.
  2. The front-end server sends back * .html, * .css, * .js and all other files needed to render the web page in your browser.
  3. You should now see the Medium page in your browser and start interacting with it. Let’s say you just hit the clap button on an article.
  4. Scripts (javascript) in your browser will send an HTTP request to the backend server with a history id. The request url will tell the backend what action to take. In this example, it will tell the backend to update the number of pops in history with id XXXXXXX.
  5. The backend program (which can be written in any language) will get the current number of claps in the database and increment it by one.
  6. Then the backend program sends the actual number of claps back to the database.
  7. The backend sends the new number of pops to the browser so that it can be reflected in the interface.

Of course, this might not be exactly the same process that happens when using the Medium web application, and in fact it would be much more complicated than this, but this simplified process can help you understand how a web application works.

Now I want you to focus on the blue arrows in the picture above. These are HTTP requests (sent from the browser) and HTTP responses (received by the browser or sent to the browser). The components that process requests from the browser and return responses to the backend server are called “APIs”.

Below is the API definition:

From Webopedia

An application program interface (API) is a collection of procedures, protocols and tools to create software applications… Essentially, an API defines how software components should interact.

Building our own API!

There are many frameworks that help us build APIs with Python, including Flask, Django, Pyramid, Falcon, and Tornado. The advantages and disadvantages, as well as a comparison of these structures, are listed here… I’ll be using Flask for this tutorial, but the technique and workflow remains the same as for the others, and alternatively, you can use your favorite framework at this point.

The latest version of Flask can be installed via pip using this command:

pip install Flask

All you need to do now is turn the code from the previous step into a function and register an API endpoint for it after initializing your Flask application. By default, a Flask application runs on localhost ( and will listen for requests on port 5000.

import pickle
# Импортируем все пакеты, которые необходимы для вашей модели
import numpy as np
import sys
from sklearn.neighbors import KNeighborsClassifier

# Импортируем Flask для создания API
from flask import Flask, request

# Загружаем обученную модель из текущего каталога
with open('./model.pkl', 'rb') as model_pkl:
   knn = pickle.load(model_pkl)

# Инициализируем приложение Flask
app = Flask(__name__)

# Создайте конечную точку API
def predict_iris():
   # Считываем все необходимые параметры запроса
   sl = request.args.get('sl')
   sw = request.args.get('sw')
   pl = request.args.get('pl')
   pw = request.args.get('pw')

# Используем метод модели predict для
# получения прогноза для неизвестных данных
   unseen = np.array([[sl, sw, pl, pw]])
   result = knn.predict(unseen)
  # возвращаем результат 
   return 'Predicted result for observation ' + str(unseen) + ' is: ' + str(result)
if __name__ == '__main__':

Representing your model as an API

On the terminal, you should see the following:

* Serving Flask app "main" (lazy loading)
* Environment: production
  WARNING: This is a development server. Do not use it in a production deployment.
  Use a production WSGI server instead.
* Debug mode: off
* Running on (Press CTRL+C to quit)

Open your browser and enter the following query in the address bar:


If something like this appears in your browser, congratulations! You are now exposing your machine learning model as a service with an API endpoint.

Predicted result for observation [['3.2' '1.1' '1.5' '2.1']] is: [1]

API Testing with Postman

We recently used our browser for quick API testing, but this is not a very efficient way. For example, we could not use the GET method, but instead use the POST method with the authentication token in the header, and it is not easy to get the browser to send such a request. When developing software Postman is widely used for testing APIs and is completely free for basic use.

Postman user interface (with Postman download pages)

After downloading and installing Postman, open the tool and follow the instructions below to submit your request.

Sending a GET request with Postman

  1. Make sure you select the GET request, as we are configuring the API to only receive a GET request. This may not work if you accidentally select a POST request.
  2. Paste your request URL here.
  3. In this table, you need to update the query parameters. Feel free to play around with these parameters and see what you get.
  4. Click the Submit button to submit your request to our API server.
  5. The response from our server will be displayed here.
  6. You can also check for more information on this HTTP response. This can be very useful for debugging.

Now that you know how to expose your machine learning model as a service through an API endpoint and test that endpoint with Postma, the next step is to containerize your application with Docker, where we’ll take a closer look at how Docker works and how it can help us. solve all the dependency problems we encountered before.

Read the first part.

Similar Posts

Leave a Reply