How to recognize industrial parts from photographs using machine vision

Hello, Habr! Today we’ll talk about how neural networks can help in detail recognition and why you need it at all. Recently one of our clients contacted us – a large industrial company, a manufacturer of trucks and their components. The parts included a large number of possible names. Because of this, employees made mistakes in visual recognition. We decided to create an application based on computer vision and neural networks. With its help, they began to check whether the worker made the right choice (Fig. 1). It was also additionally necessary to verify the name of the recognized part with the name indicated in the order invoice.

Figure:  1
Figure: 1


The first data were photographs of parts and their drawings. Each detail had an average of 500-700 photos. They were performed by the employees of the plant. After a detailed study of the pictures, it became clear that a large number of photographs did not differ much from each other. Similar photographs were taken in high-speed shooting mode with minimal angle deviation. Since identical photographs contain similar information, using such photographs as data for training neural networks was impractical. We decided to stop using most of the photos and ask for new ones. We made detailed requirements for photographs.

Model selection

There are many models of image recognition, and they all have different characteristics in terms of speed and accuracy, as well as some nuances. The choice of the model for our project was influenced by the peculiarities of the arrangement of details in the photographs. The details were photographed in such a way that there were several objects in one photo that overlapped each other.

Use only bounding boxes (Bounding Box) would not be enough (Object detection). They could strongly overlap each other during marking, training and recognition, so for training they decided to choose one of the models, with the support of the method image segmentation (Image segmentation). (Fig. 2)

Figure:  2
Figure: 2

Since, due to the specifics of the task, it was more important to determine the class of the object, and not its location, the Mask R-CNN model was chosen. This simple and flexible model allowed for efficient detection of objects in the image while simultaneously generating a high quality segmentation mask for each instance. Method Mask R-CNN extended Faster R-CNN by adding a branch for object mask prediction. This branch existed in parallel with the Bounding Box Recognition branch. Faster R-CNN made it possible to mark out in detail the contour of an object, which solved the problem of overlapping frames on each other when marking details in photographs. However, such markup took significantly longer.

In our case, the marking of objects in the image was performed manually using a third-party cloud service. It provided the ability for several employees to mark up the same dataset remotely and, after the markup was complete, download the entire dataset. (Fig. 3, 4, 5, 6)

Figure:  3
Figure: 3
Figure:  4
Figure: 4
Figure:  five
Figure: five
Fig. 6
Fig. 6

After marking up a sufficient number of photos for experiments, the first models were trained for part recognition on an HPE DL380 server with two NVIDIA Tesla v100 video cards. On average, it took 8 to 12 hours to train the first models.

Based on the training results, problems were identified that impeded recognition:

1. In the photographs of some parts, faces were found (objects that are not part of the parts, but which the model pays attention to when determining the class). This contributed to incorrect network learning.

2. Due to the specificity of neural networks, the model did not distinguish mirror details from each other, recognizing them as two details at the same time. This became a significant problem as the customer had a large number of mirror parts.

What to do with faces?

To solve this problem, we have compiled detailed instructions for taking photos for machine learning. This allowed us to reduce the number of errors and reduce the number of leaks during the next training.

The photographs below (Fig. 7 and 8) show an example of faces. The first set of photographs shows first class parts in bright light. The second shows second-class parts in darker lighting. On such a dataset, when defining a class, the model will pay attention to the background and lighting. This will happen because the parts of the image for the first and second classes are different. It is worth noting that this work of the model is incorrect: it is necessary to strive for the model to be based on the structure of parts during classification.

Fig. 7.  Sample First Class Photos
Fig. 7. Sample First Class Photos
Figure:  8. Sample photos of the second class
Figure: 8. Sample photos of the second class

What to do with mirrored parts?

To solve the problem with the recognition of mirror details, we decided to use an ensemble of models. The first model classified parts into specular and non-specular, with each symmetrical part recognized as two objects. The specular parts were then sent to the following models, which were trained to recognize only specular parts. That is, for each pair of mirror parts, its own model was created, which classified the part as ‘left’ or ‘right’ (Fig. 9).

Fig. 9
Fig. 9

How to create a model for layout

For the creation of models that classify specular parts, about 2000 photographs were requested per class. Since the models for recognizing mirror parts had a binary classification (‘left’ or ‘right’ part), 4000 photographs were used for each model.

Marking up so many photos would be time consuming. In addition, 4000 photographs were used in one model that recognizes mirror details, and there were many such models: unique for each mirror detail. We decided to make a model that selects the masks and saves them in the required form. 120 photographs of each class were manually marked and the model trained. After the parts were marked out, the inaccurate markings were manually corrected. This approach has reduced time costs and eliminated the need to mark up a large number of images from scratch.

After that, the models for recognition were trained and the parameters were fitted. (Fig. 10, 11).

Fig. 10
Fig. 11
Fig. 11

Recognition of tags and delivery notes

To solve the problem of matching the part selected by the worker with the part from the invoice, it was necessary to recognize the name information. This can be done using a barcode that contains information about the part number. Thus, there was no need to examine the entire text on the consignment note.

A mobile interface has been developed at the factory for convenient and easy recognition. It allowed taking photos from the phone or loading them from memory and sending them to the model for recognition. After that, a photo was sent to the user’s phone with the result and a list of details that were found on it.

For easy deployment of models, the entire backend has been ported to SAP Data Intelligence.

Interface and SAP data intelligence

SAP Data Intelligence allows not only publishing and embedding models, but also creating new own operators based on them (for example, from the python operator). This helps to reuse existing models and embed them in the required formats (batch processing, streaming, or publishing REST services). In addition, a pipeline based on a flow-based approach can be quickly adapted to changing conditions. For example, if integration with ERP / MES or any other system is required in the future, it will be quite easy to do. Also, all received photos can be collected in the used Data Lake for additional training of the model and improvement of the recognition quality. If you need to scale this service, it will be easy to do. It is enough to set the parallelism level (multiplicity parameter) and the corresponding number of replicas at the kubernetes level will be raised under the model. There are built-in tools for pipeline debugging, logging, tracing, monitoring.

By the way, I need to say a few words about the platform on which this project was going. Since the system is to go into commercial operation in the future, it is desirable to use industrial-grade equipment. Cisco has provided for the pilot the Cisco Hyperflex hyperconverged system, which has already been written about on Habré here, here and here.

Since SAP Data Intelligence is completely containerized, it is important that the Kubernetes cluster failover and its integration with the networks of the data center where the solution will be deployed are addressed. In fact, we completely repeated in the laboratory a typical validated Cisco & SAP design described by here and we no longer had a headache for infrastructure.

A container has been created in SAP Data Intelligence with all the required libraries. The standard OpenAPI operator was used to publish the service. The entire backend was running in a container on the server. The pipeline could also be run on any other Data Intelligence server (Fig. 12).

Fig. 12.  The architecture used to implement the tasks
Fig. 12. The architecture used to implement the tasks
Fig. 13.  Pipeline in data intelligence
Fig. 13. Pipeline in data intelligence

The customer interface was written using the openui5 open source framework. The application can be used in a browser using a pinned link, as well as downloaded to a smartphone.

The application sent photos saved in the phone’s memory to the server, or allowed you to take a new one using the phone’s camera. After recognition, the user could see a list of parts in the submitted photo. You can also view the drawings of the recognized parts.

To compare the name of the part with the position in the invoice, the user needs to open a separate menu on the main page and take a photo of the invoice, and then the part he has selected. If it does not match the numbers, the application will notify the user with a warning that the part is incorrect and shipment is prohibited.

Today we told you about creating an application for recognizing parts using neural networks. Was the truck and component manufacturer satisfied? I think yes. After all, the application made it possible to significantly reduce the errors that employees made when recognizing parts on their own. In general, over the past few years, a large number of models and systems have been created on the basis of neural networks for applying forecasting or evaluating any parameters: the state of the enterprise, the likelihood of equipment breakdown, the assessment and prediction of income and expenses. But neural networks with image recognition are not so widespread, since not many enterprises know how to integrate this technology into their processes for the benefit of the company. And this example perfectly illustrates that regardless of the task, knowing the strengths and weaknesses of neural networks, you can achieve an increase in efficiency, increase the automation of the entire enterprise, and also reduce the workload on staff.

Author – Alexey Khristenko, Consultant Data Science, SAP CIS

Similar Posts

Leave a Reply