We write a simple ML web service on FastAPI


This tutorial walks you through the process of creating a web application for sentiment detection based on an NLP model.

We will use the model from the Hugging Face Hub library, but the described approach is suitable for any machine learning task.

Plan:

  1. Loading and preparing a machine learning model for use in a web service.

  2. Creating a web service using FastAPI.

  3. Exploring the FastAPI user interface for easy manual testing and demonstrating how the application works.

  4. Writing automated tests using the pytest library.

  5. Running an application in a Docker container.

The code is available at GitHub.

0. Code organization. Separating ML code and application code

We will follow the following structure:

Packet separation ml And app helps to organize the project code more logically and conveniently for its further support and development.

  • ml contains code for working with a machine learning model.

  • app contains the code to run the web application.

In addition, the project has other important files and directories, such as:

  • tests: contains scripts for testing the code. As part of the project, we will also separately test the ml-code and the application.

  • setup.py: contains information about the package and its dependencies.

  • requirements-dev.txt And requirements.txt: These are lists of dependencies for local development and app launch, respectively.

  • Dockerfile: contains instructions for creating a Docker container.

1. Loading and preparing the machine learning model

It is a very useful practice to arrange the ML code so that it can be worked with as with black box. Later, the service will receive all the ML logic through the function load_model.

Depending on your task, load_model will include:

  • all the logic of working with features and preprocessing,

  • loading the model and the necessary artifacts from the repository,

  • inference model,

  • post-processing of predictions,

Let’s start by loading the model. In our example, we will load the model cointegrated/rubert-tiny-sentiment-balanced from Hugging Face Hub:

from transformers import pipeline

model_hf = pipeline("sentiment-analysis", model="cointegrated/rubert-tiny-sentiment-balanced")

Let’s describe the format that the model will return and with which the service will later work. For this it is convenient to use dataclass:

from dataclasses import dataclass

@dataclass
class SentimentPrediction:
    """Class representing a sentiment prediction result."""

    label: str
    score: float

Now the main thing: model – the function that the service will call to get predictions. It contains all the necessary logic with the model, pre and post-processing of data. In our case model_hf – already a pipeline that contains text preprocessing and tokenization, model inference and prediction postprocessing. We will just keep the predictions of the best class and return the answer as an instance of the class SentimentPrediction:

def model(text: str) -> SentimentPrediction:
    pred = model_hf(text)
    pred_best_class = pred[0]
    return SentimentPrediction(
        label=pred_best_class["label"],
        score=pred_best_class["score"],
    )

Now the code related to ML is finished. At this step, it is also useful to remember that we used constants when loading the model from HF.

It is better to put any constants in separate configs in order to:

  • have quick access to all model parameters,

  • it is convenient to adjust the model parameters without getting into the code.

In our case, we will use a YAML file for the configuration. config.yaml:

task: sentiment-analysis
model: cointegrated/rubert-tiny-sentiment-balanced

Then the script for getting the model model.py will look like this:

from dataclasses import dataclass
from pathlib import Path

import yaml
from transformers import pipeline

# load config file
config_path = Path(__file__).parent / "config.yaml"
with open(config_path, "r") as file:
    config = yaml.load(file, Loader=yaml.FullLoader)


@dataclass
class SentimentPrediction:
    """Class representing a sentiment prediction result."""

    label: str
    score: float


def load_model():
    """Load a pre-trained sentiment analysis model.

    Returns:
        model (function): A function that takes a text input and returns a SentimentPrediction object.
    """
    model_hf = pipeline(config["task"], model=config["model"], device=-1)

    def model(text: str) -> SentimentPrediction:
        pred = model_hf(text)
        pred_best_class = pred[0]
        return SentimentPrediction(
            label=pred_best_class["label"],
            score=pred_best_class["score"],
        )

    return model

We have also added device=-1to run the model on the CPU.

2. We write an application on FastAPI

The simplest FastAPI application looks like this:

from fastapi import FastAPI

app = FastAPI()

# create a route
@app.get("/")
def index():
    return {"text": "Sentiment Analysis"}

But, it still doesn’t know how to do anything, in particular, it doesn’t know anything about the model that we prepared in the package ml. Let’s add model loading during application start:

from ml.model import load_model

model = None

# Register the function to run during startup
@app.on_event("startup")
def startup_event():
    global model
    model = load_model()

Now it remains to add the prediction of the model. First, let’s define the format of the response SentimentResponse. We use pydantic to validate the output:

from pydantic import BaseModel

class SentimentResponse(BaseModel):
    text: str
    sentiment_label: str
    sentiment_score: float

We will return:

  • text – original text,

  • sentiment_label – the name of the class predicted by the model,

  • sentiment_score – the value of the prediction soon.

Let’s write a GET request to get a prediction for a given text. Thanks to model stores all the logic of the model inside itself, it is enough for us to transfer raw text to it. Let’s remember that model returns the prediction as a class object SentimentPredictionwhich we set earlier in the package ml. Next, we form the answer according to the given format SentimentResponse.

# Your FastAPI route handlers go here
@app.get("/predict")
def predict_sentiment(text: str):
    sentiment = model(text)

    response = SentimentResponse(
        text=text,
        sentiment_label=sentiment.label,
        sentiment_score=sentiment.score,
    )

    return response

This application is ready! All code app.py took 40 lines:

from fastapi import FastAPI
from pydantic import BaseModel

from ml.model import load_model

model = None
app = FastAPI()


class SentimentResponse(BaseModel):
    text: str
    sentiment_label: str
    sentiment_score: float


# create a route
@app.get("/")
def index():
    return {"text": "Sentiment Analysis"}


# Register the function to run during startup
@app.on_event("startup")
def startup_event():
    global model
    model = load_model()


# Your FastAPI route handlers go here
@app.get("/predict")
def predict_sentiment(text: str):
    sentiment = model(text)

    response = SentimentResponse(
        text=text,
        sentiment_label=sentiment.label,
        sentiment_score=sentiment.score,
    )

    return response

3. Set up the environment and run tests for ML code

For local launch during development, it is convenient to use virtual environments venv. Inside virtual environments, you can install and use the necessary packages and libraries without affecting the global Python environment on the system. Let’s create and activate the virtual environment:

# Create a virtual environment
python3.11 -m venv env

# Activate the virtual environment
source env/bin/activate

Now let’s build the Python package described in setup.pywith dependencies from requirements.txt. This means that we will create a package that will include all the code described in our repository, as well as all the necessary libraries specified in the files. setup.py.

# Install/upgrade dependencies
pip install -U -e .

The next step is testing the ML code. Tests help you catch bugs in your code early in development, prevent new bugs from being introduced when you make changes to your code, and reduce the time and cost of manual testing.

For testing will use the library pytest. In order to install this library and other dependencies that are only needed for development and not for using the project in production, we specified them in the file requirements-dev.txt. Dependency information is also written in the file setup.pyso we can use the command to install them:

pip install -U -e .[dev]

Machine learning code tests are stored in test_ml.py. In this example, we have 3 tests that check whether the model correctly determines the positive, negative and neutral sentiment in the text:

import pytest

from ml.model import SentimentPrediction, load_model


@pytest.fixture(scope="function")
def model():
    # Load the model once for each test function
    return load_model()


@pytest.mark.parametrize(
    "text, expected_label",
    [
        ("очень плохо", "negative"),
        ("очень хорошо", "positive"),
        ("по-разному", "neutral"),
    ],
)
def test_sentiment(model, text: str, expected_label: str):
    model_pred = model(text)
    assert isinstance(model_pred, SentimentPrediction)
    assert model_pred.label == expected_label

Library pytest provides a convenient and intuitive syntax for writing tests. Here we used:

  • fixtures – allow you to set initial conditions for tests. In our case, the fixture model loads the model before each test run.

  • parameterization – sets different values ​​for test parameters, reducing the need for code duplication.

In this case, the test checks that the model correctly determines the tone of the text, and for each parameter (text, expected_label) checks the corresponding model prediction value. If the value does not match the expected result, the test throws an error.

To run tests, use the command:

pytest tests/test_ml.py

4. Application launch and convenient FastAPI interface

Using uvicorn, we can start our application and process incoming HTTP requests. To launch the application with uvicorn run the following command:

# Run app
uvicorn app.app:app --host 0.0.0.0 --port 8080

Where app.app is the path to the file with our application, app – application instance name, --host – a parameter specifying the IP address where the server will be launched (in this case 0.0.0.0), and --port – parameter specifying the port on which the server will be launched (in this case 8080).

After executing this command, uvicorn will launch our application and start accepting incoming HTTP requests on the specified port. We will see information about the launch of the application in the terminal:

Let’s open this link in the browser and see the same message that we indicated when we started writing the application on FastAPI.

FastAPI additionally provides a very convenient interface for sending requests. It is available if you add in the browser line /docs:

Here you can poke the application with your hands. By clicking on “Try it out”, you can enter any input data and check how the application works:

Using a virtual environment is convenient for developing and testing an application on a local machine. Next, we will discuss how to run applications in a Docker container.

5. Running the application in a Docker container and testing the application

Docker makes it possible to package an application and run it on any machine. Some of its benefits:

  1. Abstraction from the host system: Docker container allows you to package an application with all dependencies and settings into a single image that can be run on any machine where Docker is installed.

  2. Isolation: running an application in a Docker container provides isolation from other processes and applications on the host machine, which reduces the risk of interaction with other applications and allows you to manage container resources.

  3. Dependency Management: A Docker container allows you to explicitly define all dependencies and versions required to run an application.

For an initial introduction to Docker, their page. There are also links to instructions on how to install Docker on different systems.

Let’s start with the Dockerfile. Dockerfile is a text file that contains instructions for building a Docker image. It is used to automatically build a Docker image that includes all the necessary dependencies, settings, and code to run the application in an isolated container. Our Dockerfile:

FROM python:3.11

COPY requirements.txt requirements-dev.txt setup.py /workdir/
COPY app/ /workdir/app/
COPY ml/ /workdir/ml/

WORKDIR /workdir

RUN pip install -U -e .

# Run the application
CMD ["uvicorn", "app.app:app", "--host", "0.0.0.0", "--port", "80"]
  • The first instruction indicates that we want to use a prebuilt Python 3.11 image as the basis for building our image.

  • We then copy all the work code to the /workdir/ work directory inside the container.

  • Line WORKDIR /workdir sets the working directory for subsequent commands in the Dockerfile. This means that all the following commands in the Dockerfile will be executed relative to this directory.

  • Next, we assemble the package, by analogy with how we did it in a virtual environment.

  • The last line specifies the command that will be executed when the container is started: launching the application with uvicorn on port 80.

Create a new Docker image named ml-appusing the Dockerfile located in the current directory:

docker build -t ml-app .

After the image has been assembled, the command

docker run -p 80:80 ml-app

launch container from image ml-app and binds port 80 inside the container to port 80 on the host.

The container will be launched and available at http://localhost:80 in the browser. The application can also be tested manually using the UI from FastAPI as described in the previous paragraph.

We can also write some tests to test the application running in the container. We will use the same examples that we used for the ML code. Only now we will send HTTP requests to the service raised in the container using the library requests.

import pytest
import requests


@pytest.mark.parametrize(
    "input_text, expected_label",
    [
        ("очень плохо", "negative"),
        ("очень хорошо", "positive"),
        ("по-разному", "neutral"),
    ],
)
def test_sentiment(input_text: str, expected_label: str):
    response = requests.get("http://0.0.0.0/predict/", params={"text": input_text})
    assert response.json()["text"] == input_text
    assert response.json()["sentiment_label"] == expected_label

To run tests, you can use the virtual environment we created earlier envsince the library is already there pytest. Then, in another terminal, without stopping the running container, run the tests, after activating the environment env:

source env/bin/activate
pytest tests/test_app.py

deactivate

Conclusion

  • In this tutorial, we have created a web application for detecting the sentiment of a text using the FastAPI.

  • We also touched on important aspects of application development: code organization, testing, configuration, running the application in a Docker container.

  • The described approach can be used for any machine learning problem.

  • The application code is available at GitHub and it can be used as a starting point for building your own web service.


I plan to write the next article about launching an ML pipeline using Airflow. Until then subscribe to my telegram channel. There will be announcements of new articles as well as work tips and shorter thoughts on DS/ML/AI.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *