Recognizing license plates on TorchServe

There are so many frameworks for the inference of neurons around that the eyes run wide. We continue the cycle of implementing the serving of one task, but with different tools. The last time the implementation was on the Nvidia Triton Inference Serve (for announcements, please in my telegram channel. The code for the article is in repositories.

Task

The task was to recognize Russian license plates. Models were taken from this repository.

The recognition pipeline is as follows:
1. Number detection using Yolov5;
2. Carved numbers are run through a Spatial transformer (STN) for alignment;
3. Number text is recognized from LPR-net.

framework

Used for inference [TorchServe]. This framework is part of the Pytorch ecosystem. It is actively developing.

The documentation says the following about it:

TorchServe is a performant, flexible and easy to use tool for serving PyTorch eager mode and torschripted models.

Possibilities:

Converting Models

Like Triton, TorchServe requires the user to translate models into their format. There are utilities for that. torch-model-archiver And torch-workflow-archiver for models and graphs, respectively.

To convert we need:

  1. Model in TorchServe/Onnx/etc format;

  2. Script describing the pipeline of the model.

Such a script is called a handler. In him determined the main stages of the life cycle of the model (initialization, preprocessing, prediction, postprocessing, etc.). For typical tasks, they are already predetermined.

STN and LPR models are easy are converted in TorchServe, so no additional libraries are used in their handlers. Imports look like this:

import json
import logging
from abc import ABC
import numpy as np
import torch
from ts.torch_handler.base_handler import BaseHandler

Yolo couldn’t just be translated into TorchScript, as part of the logic for handling requests was left outside the model. Since there was no desire to dig into this, and also for the sake of a scenario closer to life, in the model handler Yolo is initialized from TorchHub. In imports, we already see third-party modules:

from inference_torchserve.data_models import PlatePrediction
from nn.inference.predictor import prepare_detection_input, prepare_recognition_input
from nn.models.yolo import load_yolo
from nn.settings import settings

For this to work, you need in the dockerfile install to the global interpreter the packages you need.

In TorchServe, you do not need to hardcode the type and dimension of model inputs and outputs, so you do not need to define any configs for models. On the one hand, this is convenient, but on the other hand, it creates chaos if one format is not followed.

The converted model is a zip archive with the extension .marwhich contains all the artifacts (service information, weights, scripts and additional files).

.
├── MAR-INF
│   └── MANIFEST.json
├── stn.pt
└── stn.py

In my opinion, the archive solution is inconvenient for development. After any change, you must re-convert the model. I also experienced problems when running a remote debugger in it.

In order for TorchServe to load the models, they need to be put in the same folder – model storage And indicate the path to it in the parameters. To raise all models at startup, you must specify --models all.

Making a recognition pipeline

The selected license plate recognition pipeline consists of sequential prediction by several models. To do this, TorchServe has Workflow. It allows you to set both serial and parallel processing graph:

# последовательный 
dag:
  pre_processing : [m1]
  m1 : [m2]
  m2 : [postprocessing]
input -> function1 -> model1 -> model2 -> function2 -> output
# параллельный граф
dag:
  pre_processing: [model1, model2]
  model1: [aggregate_func]
  model2: [aggregate_func]
                          model1
                         /       \
input -> preprocessing ->         -> aggregate_func
                         \       /
                          model2

For the problem under consideration turned out next series-parallel graph. The aggregate node combines the coordinates of the numbers with the recognized texts.

    ┌──────┐
    │ YOLO ├─────┐
    └──┬───┘     │
       │         v
       │      ┌─────┐
plate  │      │ STN │
coords │      └──┬──┘
       │         │
       │         v
       │      ┌──────┐
       │      │LPRNET│
       │      └──┬───┘
       v         │
   ┌─────────┐   │ plate
   │aggregate│<──┘ texts
   └─────────┘

For convenience and simplicity, data between models transmitted in the form of dictionaries. Serialization of such data in TorchServe is very inefficient (translate to a string and add line breaks), so try to pass them as tensors or bytes.

Note that workflow cannot be started automatically when the server starts – you must explicitly send a request for it. If you really want to do when raising the server, then you can So.

curl -X POST http://localhost:8081/workflows?url=plate_recognition

Using Models

Models are defined. The server is running.

To execute a previously defined model or workflow, you need to send a request to TorchServe to use plate_recognition (I used RESTbut there is also GRPC). Endpoint is used for models predictionsand for workflow wfpredict.

response = requests.post(
    "http://localhost:8080/predictions/yolo", data=image.open("rb").read()
)

response = requests.post(
    "http://localhost:8080/wfpredict/plate_recognition", 
    data=image.open("rb").read()
)

Conclusion

Well, the inference is written. This example is not too simple to be completely useless, but not too complex to cover all the features of this inference framework.

This tutorial did not cover all the features of TorchServe, so I advise you to look in the documentation about:

Subscribe to my channel – there I talk about neurons with an emphasis on serving.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *