In the last post, I started talking about the IBM Maximo Visual Inspection solution from IBM, which allows the use of computer vision technology without a team of data scientists and impressive development costs. Now I will tell you more about how to apply this solution.
Maximo Visual Inspection has great tutorials that are posted by the company itself (for example, you can start with story on the use of this tool to analyze lung imaging at the start of the covid-19 pandemic), and I don’t think it is worth showing the details. But I would like to show at least superficially why it is worth paying attention to this tool to simplify entry into the area, so there will be several examples of how data preprocessing occurs (well, you will check the truth of my words about the “level of the circle of basic computer literacy”!).
As a dataset for an example of manual markup and image segmentation, I took a dataset from Kaggle – this is competition from “Severstal” for the detection of defects from the photograph of the shooting of sheets passing along the conveyor. Once we discussed this problem with colleagues a lot, since our team took part in this competition. It was an interesting task, and in the end Severstal implemented it in an interesting way – you can read about it on Kaggle.
So, marking along the contour. We have a sheet of steel with defects of different classes on it. The dataset was provided by the already marked-up specialists of Severstal, but in this case I imagined that there was no markup, and I was a defect assessor or a project manager who was well versed in metallurgy.
The marking can be done in the mode of a regular mouse using box-rectangles, if we need to determine the place with a defect in the image, or using a polygon, if we need to clearly outline the area where the defect is.
If you are working with a normal mouse, then the markup for image segmentation by pixels will contain some elements of “pixel-hunting”. But it is possible that this task will be simplified in the next releases with a tool similar to a “magic wand” from visual editors.
Maximo Visual Inspection has the ability to apply filters based on how we mark up images. If I had already prepared the markup for a significant part of the images, then I could mark the sheets that have some kind of defect and the sheets that have no defect.
All this is highlighted, and you can immediately, without additional tools, get images suitable for demonstration in presentations. Classes are signed in a convenient way, you can select polygons.
After marking up the first five images, you can try to apply a model to your dataset, which will try to do auto-markup in several iterations. How well it will perform specifically on your data is difficult to predict. The documentation recommends trying to run four iterations of auto-layout, quite often you can achieve quite acceptable quality at a level of more than 90%. Obviously, good accuracy of auto-tagging may require more than five images and some variety if objects are somehow significantly different from each other.
Video labeling is similar. In Maximo Visual Inspection, you can upload a video to mark where certain objects are located and how they move around the image. From the point of view of a computer, video is simply a sequence of images that has evolved over time. The temporal aspect makes the task more difficult, different frames need to be correlated with each other. On stills from the video, just as in the manual marking mode, certain actions can be marked.
Common for preparing images for training a neural network includes “augmentation” – a small change in the original images used in order to increase the diversity in the data (and to avoid ridiculous accidents when the desired object was always in the right corner and the neural network decided that this is a rule, and always searches now only there). In Maximo Visual Inspection, the main of these technologies are built-in: color conversion, cropping (just so that the model does not retrain for absolute coordinates in images), image rotation in different directions.
In general, all of these techniques are very useful, but think about whether your existing input images in the natural environment can turn out as they were as a result of augmentation? Taking the Severstal case as an example, it is clear that the leaf can be rotated as desired due to the specific location of the camera over the camp along which it travels, but rotation does not make sense, because images from this camera always come in the same orientation. Whether it makes sense to blur the image is a big question if the camera is the same, as is the level of dust in the workshop. In general, all questions about the applicability of a specific type of augmentation to a specific dataset should be resolved independently as needed.
Maximo Visual Inspection offers model training based on auto-matched parameters and shows real-time training curves. Even a layman knows that if the error goes up, then something went wrong. It is possible to use different architectures of neural networks of models, but R-SNNs are offered as the first choice models, which are the most popular today.
In terms of analyzing the results of the model, Maximo Visual Inspection offers all the same tools that a data scientist will use; in the case of classification problems, we are talking about metrics related to the accuracy of determining objects. The balance of precision and recall, and how valuable overall accuracy is to you, is a matter of business challenge. The accuracy metric is used in classification objects, but it is very sensitive to the balance of classes: if you have one image of class A, and all other images belong to class B, then problems are possible (when selecting all objects of class B, accuracy will not suffer much). The balance of precision and recall depends on your application. Do you need to find the largest number of objects of a certain class (albeit at the expense of a large number of false positives)? Or is it important for you that when a model says that she found something – this is exactly what she is reporting about?
The Confusion Matrix is a very useful tool that allows you to gauge which classes are more likely to be mistaken for something else. Let’s say, in theory, that you have employees who need to put on a headscarf and employees who need to put on a surgical cap. You trained the model and found out that with good other metrics, the model often takes a hat for a kerchief, and this is unacceptable, since people who are required to wear a hat must wear it. You will see the answers to these kinds of questions using the Confusion Matrix.
Our colleagues from IBM have several solutions related to the implementation of pre-trained models using Maximo Visual Inspection, in particular, there is a boxed solution for production that allows you to immediately use the detection of some parts, including using mobile devices. In addition, IBM has traditionally offered tailor-made solutions.
If you are interested in how the entire pipeline of work with Maximo Visual Inspection looks like on real data, I recommend going to this repository: https://github.com/IBM/powerai-counting-cars – it shows the whole process from markup to model training. The model is implemented in Jupyter Notebook. There are more detailed screenshots for each step, it is clear how the data was marked. In this example, there is a counter for video, but for images, the task is related, with minor differences.
For machine vision and video analytics solutions, we actively cooperate with the IBM Client Center in Moscow. This helps to understand the details and features of these solutions, find effective technology configurations for performing tasks related to object recognition and video analytics.
I hope this review will help those who would like to try to implement a solution based on computer vision, but are worried that a team of datasinters, dressed in robes decorated with secret mathematical symbols, will spend the entire project budget and will not achieve results. Box-based solutions such as Maximo Visual Inspection will help you assess the reality of the task and learn to speak in close terms when the time comes to call these guys and do something really complex and custom.
Tsareva Alexandra, Leading Machine Learning Specialist at Jet Infosystems.