Classification of the components of the microstructure of steels using computer vision

The purpose of this work is to develop a computer vision model for the recognition and classification of the components of the steel microstructure.

In metal science, it is customary to call the components of the microstructure phases. The importance of determining the type of microstructural phases is dictated by the influence of the size and ratio of volume fractions of microstructure phases on the mechanical properties of steel. Basically, the determination of the type of microstructure is carried out by experts “by eye”, which in some cases leads to disagreements in the assessment.

In this work, the model was trained to classify phases such as ferrite, bainite, and pearlite. The model was trained on the steel microstructure in the state after rolling without additional heat treatment. This is an important caveat, because the difference in the specific “pattern” of the microstructure between the heat-treated and non-heat-treated phases is significant.

Figure 1 shows an image of the microstructure of steel with a separated phase.

Figure 1. General view of the steel microstructure

Figure 1. General view of the steel microstructure

When preparing the data for training, the grain of only one of the phases (ferrite, bainite, or pearlite) was left on each image. For each class of microstructure, 100 images of 224×224 in grayscale (8 bits) were selected. The data set is divided into train and test samples in a ratio of 80/20. Images are sorted into folders corresponding to each phase (class). The “ImageFolder” function allows you to assign a label to images, which is the name of the corresponding folder.

import torchvision.datasets as datasets 
import torchvision.transforms as transforms
train = datasets.ImageFolder("path/train", transform = transformations)
val = datasets.ImageFolder("path/test", transform = transformations_s)
train_loader =, batch_size=4, shuffle=True)
val_loader =, batch_size =4, shuffle=False)

Using the function dataloader data loaders were created in separate batches – batch.

In order to compensate for a small amount of data, the “Transfer Learning” approach was applied, using a model trained on a large number of images. The last layer of the pre-trained model was replaced by a classifier for 3 classes of microstructures. Densenet161 was chosen as the pretrained model, and the mode pretrained = True was set to initialize the pretraining.

num_labels = 3 
classifier = nn.Sequential(nn.Linear(512, num_labels),
# Заменим последний полносвязный слой модели на наш классификатор 
model.fc = classifier

Images before loading into the model are transformed into tensors and tensors, in turn, are normalized.

transformations = transforms.Compose([
    transforms.Normalize(mean=[0.485], std=[0.229])])

To assess the classification accuracy of the model, the “accuracy” metric was chosen. When training on a baseline model, a spread in “accuracy” of 0.1 was obtained and it was noted that this metric does not improve from epoch to epoch. It is concluded that the model is underfitting. In order to increase the trainability of the model, it was decided to diversify the data set using augmentations and reduce the number of features by image binarization. In addition, it was decided to replace the pre-trained model from densenet161 with ResNet18, using the residual connection (transfer of the original tensor through layers).

Figure 2 shows binarized images converted back from the tensor after applying perspective distortions, vertical and horizontal reflections.

Figure 2. Augmentations

Figure 2. Augmentations

Binarization was carried out by the Otsu method. Otsu’s Method uses the histogram of an image to calculate the threshold for whether a pixel is 0 or 1.

To identify the most effective type of augmentation, they were used separately. The results of applying various types of augmentations are shown in Table 1.

Table 1. Application of augmentations

Table 1 shows that the best results are obtained with vertical and horizontal reflections, as well as perspective distortion. Therefore, further for the transformation of images we will use only these types of augmentations.

To determine the optimal learning rate, the function was applied lr_scheduler (hereinafter the scheduler), which allows you to change the learning rate every 2 epochs with a multiplier of 0.1. The multiplier (gamma=0.1) means that the learning rate will decrease by 10 times.

# Зададим начальную скорость обучения = 0.01 
optimizer = optim.Adam(model.fc.parameters(), lr=1e-2)
# Импортируем шедулер
from torch.optim import lr_scheduler
# Функция снижения скорости обучения встроена в данный шедулер. 
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.1)

The scheduler was tested using two pre-trained models ResNet18 and ResNet34 differing in the number of layers. The results of testing are presented in table 2.

Table 2. Summary of results

We get optimal results in terms of accuracy at LR values ​​of 1e-5 and 1e-6.

Let’s check how accurately the model can predict the microstructure component in real images. Grayscale images of three components of the microstructure were selected, which had not previously been in either the training dataset or the testing dataset.

The results of applying the model based on the pre-trained ResNet18 and ResNet34 models are presented in Figures 3 and 4, respectively.

Figure 3. Applying a model based on ResNet18

  Figure 4. Applying a model based on ResNet34

Figure 4. Applying a model based on ResNet34

From Figures 3 and 4, we see that in one case the ResNet34-based model made the mistake of indicating that the microstructure, which is actually bainite, is 100% pearlite.

Checking the model on another bainite image showed that for images with bainite, the performance of the model based on Resnet34 is unstable. Perhaps this is due to a small sample for the dataset. In addition, the specific “pattern” of bainite has a greater variety than ferrite and pearlite, which also indicates the need to increase the dataset.

Figure 5. Additional image with bainite for a model based on ResNet34

Figure 5. Additional image with bainite for a model based on ResNet34

Based on the results of the study, the following conclusions can be drawn:

  • The prediction of the components of the microstructure with the input of the model of random images occurs with a probability of 93 – 100%;

  • Considering that the ResNet34-based model makes errors in determining the “bainite” phase, it is necessary to increase the dataset and work with bainite images.

In the future, it is planned to develop a model based on the classification of microstructure components that determines the type of microstructure in real images, consisting of 2 or more components.

It is also an interesting task to train a model for classifying the components of the metal microstructure in a heat-treated state.

The author expresses his gratitude to the people who made this study possible, namely Anton Vitvitsky for help in selecting image processing methods for computer vision and for working on errors, as well as Maria Tikhonova, who is my first Machine Learning Teacher.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *