We recognize car numbers. Developing a multihead model in Catalyst
Fixing various violations, access control, tracing and tracking cars are just some of the tasks for which it is required to determine the car number (state registration plate or license plate) from a photograph.
In this article, we will look at creating a model for recognition using Catalyst – one of the most popular high-level frameworks for Pytorch. It allows you to get rid of a large amount of repetitive code from project to project – the training cycle, calculating metrics, creating check-points of models and others – and focus directly on the experiment.
You can make a model for recognition using different approaches, for example, by searching and identifying individual characters, or as an image-to-text task. We will consider a multihead model. Take as a dataset dataset with Russian numbers from the project Nomeroff net… Examples of images from the dataset are shown in Fig. one.
Fig. 1. Examples of images from the dataset
General approach to solving the problem
It is necessary to develop a model that will receive an image of the license plate at the input, and give a string of recognized characters at the output. The model will consist of a feature extractor and several classification heads. The dataset contains an 8 and 9 character GRZ, so there will be nine goals. Each head will predict one character from the alphabet “1234567890ABEKMHOPCTYX”, plus the special character “-” (hyphen) to indicate the absence of the ninth character in the eight-digit license plate. The architecture is shown schematically in Fig. 2.
Fig. 2. Model architecture
We take the standard cross-entropy as the loss function. We will apply it to each head separately, and then we will sum up the obtained values to obtain the overall loss of the model. Optimizer – Adam. We also use OneCycleLRWithWarmup as a leraning rate scheduler. The batch size is 128. The training duration is set at 10 epochs.
As a preprocessing of the input images, we will perform normalization and transformation to a single size.
Coding
Next, let’s look at the main points of the code. The dataset class (Listing 1) is generally common for CV tasks in Pytorch. It is worth paying attention only to how we return a list of character codes as a target. In the parameter label_encoder a utility class is passed that can convert alphabet characters to their codes and vice versa.
class NpOcrDataset(Dataset):
def __init__(self, data_path, transform, label_encoder):
super().__init__()
self.data_path = data_path
self.image_fnames = glob.glob(os.path.join(data_path, "img", "*.png"))
self.transform = transform
self.label_encoder = label_encoder
def __len__(self):
return len(self.image_fnames)
def __getitem__(self, idx):
img_fname = self.image_fnames[idx]
img = cv2.imread(img_fname)
if self.transform:
transformed = self.transform(image=img)
img = transformed["image"]
img = img.transpose(2, 0, 1)
label_fname = os.path.join(self.data_path, "ann",
os.path.basename(img_fname).replace(".png", ".json"))
with open(label_fname, "rt") as label_file:
label_struct = json.load(label_file)
label = label_struct["description"]
label = self.label_encoder.encode(label)
return img, [c for c in label]
Listing 1. Dataset class
In the model class (Listing 2), we use the library PyTorch Image Models to create a feature extractor. We add each of the classification heads of the model to ModuleListso that their parameters are available to the optimizer. Logits from the exit of each of the goals are returned as a list.
class MultiheadClassifier(nn.Module):
def __init__(self, backbone_name, backbone_pretrained, input_size, num_heads, num_classes):
super().__init__()
self.backbone = timm.create_model(backbone_name, backbone_pretrained, num_classes=0)
backbone_out_features_num = self.backbone(torch.randn(1, 3, input_size[1], input_size[0])).size(1)
self.heads = nn.ModuleList([
nn.Linear(backbone_out_features_num, num_classes) for _ in range(num_heads)
])
def forward(self, x):
features = self.backbone(x)
logits = [head(features) for head in self.heads]
return logits
Listing 2. Model class
The central link that connects all components and provides training for the model is Runner… It provides an abstraction over the model learning-validation cycle and its individual components. In the case of training a multihead model, we will be interested in the implementation of the method handle_batch and a set of callbacks.
Method handle_batch, as the name suggests, is responsible for processing batch data. We will only call the model with the batch data in it, and the processing of the results obtained will be the calculation of the loss, metrics, etc. – we implement using callbacks. The method code is shown in Listing 3.
class MultiheadClassificationRunner(dl.Runner):
def __init__(self, num_heads, *args, **kwargs):
super().__init__(*args, **kwargs)
self.num_heads = num_heads
def handle_batch(self, batch):
x, targets = batch
logits = self.model(x)
batch_dict = { "features": x }
for i in range(self.num_heads):
batch_dict[f"targets{i}"] = targets[i]
for i in range(self.num_heads):
batch_dict[f"logits{i}"] = logits[i]
self.batch = batch_dict
Listing 3. Runner implementation
We will use the following callbacks:
CriterionCallback – to calculate the loss. We need a separate copy for each of the model heads.
MetricAggregationCallback – to aggregate the losses of individual heads into a single loss of the model.
OptimizerCallback – to run the optimizer and update the model weights.
SchedulerCallback – to start LR Scheduler.
AccuracyCallback – to have an idea of the accuracy of the classification of each of the heads during training of the model.
CheckpointCallback – to maintain the best model weights.
The code that generates the list of callbacks is shown in Listing 4.
def get_runner_callbacks(num_heads, num_classes_per_head, class_names, logdir):
cbs = [
*[
dl.CriterionCallback(
metric_key=f"loss{i}",
input_key=f"logits{i}",
target_key=f"targets{i}"
)
for i in range(num_heads)
],
dl.MetricAggregationCallback(
metric_key="loss",
metrics=[f"loss{i}" for i in range(num_heads)],
mode="mean"
),
dl.OptimizerCallback(metric_key="loss"),
dl.SchedulerCallback(),
*[
dl.AccuracyCallback(
input_key=f"logits{i}",
target_key=f"targets{i}",
num_classes=num_classes_per_head,
suffix=f"{i}"
)
for i in range(num_heads)
],
dl.CheckpointCallback(
logdir=os.path.join(logdir, "checkpoints"),
loader_key="valid",
metric_key="loss",
minimize=True,
save_n_best=1
)
]
return cbs
Listing 4. Code for getting callbacks
The rest of the code is trivial for Pytorch and Catalyst, so we won’t include them here. Full code for the article is available on github…
Experiment Results
Fig. 3. Graph of the model’s loss-function in the learning process. Orange line – train loss, blue – valid loss
The list below lists some of the mistakes that the model made on the test set:
Incorrect prediction: T970XT23- instead of T970XO123
Incorrect prediction: X399KT161 instead of X359KT163
Incorrect prediction: E166EP133 instead of E166EP123
Incorrect prediction: X225YY96- instead of X222BY96-
Incorrect prediction: X125KX11- instead of X125KX14-
Incorrect prediction: X365PC17- instead of X365PC178
All possible types are present here: incorrectly recognized letters and numbers of the main part of the license plate, incorrectly recognized digits of the region code, an extra digit in the region code, as well as an incorrectly predicted absence of the last digit.
Conclusion
In this article, we examined a way to implement a multihead model for recognizing a license plate of cars using the Catalyst framework. The main components were the model itself, as well as the runner and a set of callbacks for it. The model was successfully trained and showed high accuracy on the test sample.
Thanks for attention! We hope that our experience was helpful to you.
More of our articles on machine learning and image processing:
Data Science: Predicting Business Events to Improve Service
How we use computer vision algorithms: video processing in a mobile browser with OpenCV.js
Testing Complementary Cross-Entropy in Text Classification Problems