How a computer evaluates the external condition of POS terminals

  • Restoring the functionality of POS terminals.

  • Replacement of terminals and their parts.

  • Organization of maintenance repair works.

  • Installation of POS terminals.

  • Scheduled preventive maintenance work.

When an engineer arrives on site, he must take several photos. First, the POS terminal itself, so that the quality of the work done can be judged.

You also need to take a photo of the paper receipt with the test transaction. This allows you to make sure that the terminal is working. If for some reason it is impossible to take a paper receipt (for example, the receipt tape has run out), then the engineer takes a photo of the electronic receipt on the screen of a smartphone or laptop.

We receive on average over 10,000 applications per day from all over the country, and each application usually contains 3-5 photos. Manual verification of completed work takes a lot of time, and so we have developed a solution using machine learning algorithms that processes photos and produces results in ~2 hours.

How it all works

Our algorithm works in three stages:

  1. Loading data on applications.

  2. We process each request sequentially using neural networks.

  3. We aggregate the results for the application.

Applications can be downloaded either via a direct link or as an archive with photographs. Each photo is fed to the input of four sequential neural networks.

The first neural network detects whether a paper receipt is in a photograph, the second one detects an electronic receipt. The third one detects whether there is a terminal in the image, and the last one detects various defects in the terminal. They were trained on a PC with an RTX3090 Ti video card. Augmentations and “anti-classes” were used during training, that is, images that do not contain the object of interest. Random transformations were applied to each existing image, which change the image geometry, add distortions, weather effects, glare, change contrast, etc. In total, the sample for training the terminal detection model included ~60,000 photographs, and ~1,500 photographs were used to train the defect detection model. After training, pruning was applied to the classifiers to increase performance; the photo processing time decreased by 30%.

The models are currently running on a virtual machine with one Intel Xeon Gold 6242R processor, 20 cores, 3.1 GHz, and 32 GB of RAM. We also ran them in SberCloud using one Nvidia Tesla V100 video card (16 GB of RAM, 2.4 GHz).

Let's look at each model in more detail.

Model by checks

A binary classification model with MobileNetV3 architecture. It has two outputs: there is a paper receipt in the photo or there is no receipt.

For training, ~2500 photos from real applications were used. When training a neural network, a special set of marked pictures is allocated, on which there are marks: there is a receipt or there is no receipt. Using this set, we evaluate the quality of the neural network. On the right in the illustration is a confusion matrix. From it it becomes clear that the share of correct answers is 94%. The processing time of one photo using the model in the onnx format is ~150 ms.

Electronic check model

This is also a binary classification model of MobileNetV3 with two outputs.

There were quite a few problems with this model: encoding errors are common; gadget screens are very heterogeneous; backgrounds can be different; sometimes engineers display the contents of a receipt in Notepad or another application. Therefore, many tricks had to be used to achieve high quality of the neural network. Today, the share of correct answers is 95%. To train the model, ~1500 photos were used, and the speed of photo processing is the same as in the model for paper receipts, since a similar architecture and model format are used.

Model by terminals

This model has YOLOv8n architecture. It determines not the fact of the presence of an object in the photo, but the coordinates of the object.

On the left you can see a photo from a real application, which shows the new Sber terminal Kozen P12 with biometrics. The red frame outlines the result of the model, where the red frame outlines the coordinates of the area in which there is an image of the terminal. Using these coordinates, you can calculate what area in the photo the terminal occupies. This is necessary in order to assess the quality of the photo taken by the engineer. In a correctly completed application, the terminal should occupy at least 60% of the area of ​​the photo.

Now let's look at the error matrix. The model can already output two classes: a classic POS terminal or a new one with biometrics. The model can also output an empty object — this means that it did not detect the terminal in the photo. Errors in the matrix are marked with a blue background.

For old terminals, the model only looks for the screen and keyboard in photographs, for new ones – the screen itself. We are not interested in other sides of the devices, including the stands. By the way, by the end of 2024, the company plans to implement more than 500,000 terminals with biometrics.

Model by visual defects

Initially, we tried to use a classifier to identify defects, but abandoned this idea due to the low quality of the model. After that, we chose a model with the YOLOv8m architecture. Today, it can recognize 8 types of different defects – dirt, abrasions, scratches on the screen, keys and case.

The total operating time of the two models is less than 1 second.

As a result of the work, the model returns a line with areas where defects were detected, classes of defects, and also the probability of classifying an area as a certain defect. It is very difficult to process such a result manually, therefore, rules were developed empirically that allow reducing the forecast to a binary label: “there is a defect/no defect”. It is difficult to formalize the concept of a “terminal with a visual defect”, therefore, the rules and their strictness are constantly being refined taking into account the wishes of the business.

To assess the condition of terminals new model We apply the following rules:

A defect is NOT considered if:

  • screen dirt area (screen_dirt) ≤5% of the total area of ​​the terminal area;

  • scratches on the screen (screen_scratch) ≤5% of the total area of ​​the terminal area.

A terminal is considered defective if one of the following conditions is met:

  • there is some wear on the screen;

  • the number of defects with dirt (screen_dirt, body_dirt) is more than 5;

  • there is a scratch with an area > 5% relative to the area with the terminal;

  • there is a defect with dirt (screen_dirt, body_dirt), the area of ​​which is more than 7% of the terminal area.

In other cases, the terminal is considered free of defects.

Here are the evaluation rules for terminals old models:

A terminal is considered defective if one of the following conditions is met:

  • there is some wear on the screen;

  • total area of ​​all body abrasions ≥7%;

  • there is a scratch with an area greater than 5% relative to the area with the terminal;

  • total area of ​​defects with dirt (screen_dirt, body_dirt) ≥ 7%;

  • there is a defect with dirt (screen_dirt, key_dirt, body_dirt), the area of ​​which is more than 7% of the terminal area.

As a result of processing applications by the model, all photos that were found to be defective are sent to the business for verification, on average 40-50 applications with photos. This allows collecting data for additional training of models, as well as punishing unscrupulous engineers for installing bad equipment. According to the results of data verification by the business customer, the share of correctly recognized defective devices is ~80%.

Intermediate results of the model

At the beginning of 2024, the project was launched in pilot mode for 3 service companies. On average, more than 10,000 applications are processed daily, of which 40-60 are recognized as problematic and sent to the business for verification. Thus, in total, over the entire period, more than 10,000 applications were recognized as defective among all applications (approximately 2,000,000). If we manually check absolutely every application, we will get huge costs in human resources, so it is obvious that our solution allows us to achieve colossal time savings. If we spend 10 seconds checking each application, then without our model the total time would take more than 20 days, when using our model, the business spent no more than 3 hours in total.

Next steps

  1. The model will be built into the engineers' application. After each request is completed, the employee will take photos of the installed device. If defects are found, the request cannot be considered completed.

  2. Adding new Kozen devices to the terminal recognition model

  3. Replacing the model with classic device detection with a model with areas rotated relative to the center of the device (Oriented Bounding Boxes).

  4. Expanding the project to smart cash registers. Engineers are working not only on installing terminals, but also on other equipment, in particular those related to smart cash registers. Currently, a pilot project has been launched to process applications with this type of equipment, using the same model as in the POS terminal project. Currently, a training sample is being collected and data is being marked up to train a defect recognition model.

Project developers:

  • Zharikov Dmitry

  • Smirnov Alexander

  • Zalipaeva Alexandra, head

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *