Computer vision lessons in Python + OpenCV from the very beginning. Part 10. My pet project

Table of contents:

On In the previous lesson, I talked about my pet project related to computer vision. In this tutorial, you have been introduced to the ideas and architecture of this pet project. Today I will continue to describe how I added new classes to the project and what came of it. Let me remind you that the idea was to write a full-fledged image processing pipeline, starting with a simple task, such as license plate recognition. As a result of the experiment, it turned out that the well-known tesseract character recognition library does not recognize numbers well. It was decided to write some kind of recognition for numbers. But first you need to somehow find where these numbers are located in the image.

Let me remind you what steps were taken in the last lesson:

· Apply median filtering to the image.

Perform binarization.

Today we will go a little further: select the outline and find the license plate rectangle on it. First, let’s write a class that produces a contour selection:

class ContourProcessingStep(ImageProcessingStep):
    """Шаг, отвечающий за выделение контуров"""

    def process(self,info):
        """Выполнить обработку"""

        contours, hierarchy  = cv2.findContours(info.image, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

        height, width = info.image.shape[:2]
        contours_image = np.zeros((height, width, 3), dtype=np.uint8)

        # отображаем контуры
        cv2.drawContours(contours_image, contours, -1, (255, 0, 0), 1, cv2.LINE_AA, hierarchy, 1)

        #Заполним данные
        new_info=ImageInfo(contours_image)
        new_info.contours=contours
        new_info.hierarchy=hierarchy

        return new_info

This contour must be approximated, we will develop the following class:

class ContourApproximationProcessingStep(ImageProcessingStep):
    """Шаг, отвечающий за апроксимацию контуров"""

    def __init__(self,eps = 0.005, filter=None):
        """Конструктор
        eps - размер элемента контура от размера общей дуги"""

        self.eps=eps
        self.filter=filter

    def process(self, info):
        """Выполнить обработку"""

        approx_countours=[]
        img_contours = np.uint8(np.zeros((info.image.shape[0], info.image.shape[1])))
        for countour in info.contours:
            arclen = cv2.arcLength(countour, True)
            epsilon = arclen * self.eps
            approx = cv2.approxPolyDP(countour, epsilon, True)
            append=False
            if not(self.filter is None):
                if self.filter(approx):
                    append=True
            else:
                append=True
            if append:
                approx_countours.append(approx)

        cv2.drawContours(img_contours, approx_countours, -1, (255, 255, 255), 1)

        #Заполним данные
        new_info=ImageInfo(img_contours)
        new_info.contours=approx_countours

        return new_info

As a filter, a link to a function that will select contours by some criterion (we only need a rectangular contour).

So, we test, first without a filter:

import cv2

from Libraries.Core import Engine
from Libraries.ImageProcessingSteps import MedianBlurProcessingStep, ThresholdProcessingStep, ContourProcessingStep, \
    ContourApproximationProcessingStep

def my_filter(approx):
    if len(approx)==4:
        return True
    return False

my_photo = cv2.imread('../Photos/car.jpg')
core=Engine()
core.steps.append(MedianBlurProcessingStep(5))
core.steps.append(ThresholdProcessingStep())
core.steps.append(ContourProcessingStep())
core.steps.append(ContourApproximationProcessingStep(0.02))
#core.steps.append(ContourApproximationProcessingStep(0.02,my_filter))
res,history=core.process(my_photo)

i=1
for info in history:
    cv2.imshow('image'+str(i), info.image) # выводим изображение в окно
    i=i+1
cv2.imshow('res', res.image)

cv2.waitKey()
cv2.destroyAllWindows()

Let’s see what we got:

Here, for clarity, I showed a reduced photo of the machine. Let’s try to process a full-size photo:

So, we see something like a rectangle (yes, it’s curved, but other shapes don’t look like a rectangle at all).

Now the question arises: how can we find our “rectangle” among all these lines? First, let’s apply a filter (that’s what we need), discarding all the shapes that are not quads. For this, as you noticed, in the text of the program there is such a function:

def my_filter(approx):
    if len(approx)==4:
        return True
    return False

It remains to change this line of code

core.steps.append(ContourApproximationProcessingStep(0.02))

To this one:

core.steps.append(ContourApproximationProcessingStep(0.02,my_filter))

And voila, we are left with only quads:

As you can see, there are significantly fewer objects left, but there is still a lot of garbage. Let’s filter it by removing objects that are too small:

def my_filter(approx):
    if len(approx)==4:
        if abs(approx[2,0,0]-approx[0,0,0])<10:
            return False
        if abs(approx[2,0,1]-approx[0,0,1])<10:
            return False
        return True
    return False

And here’s what we got:

There are only 5 properties left. In theory, of course, additional filtering can be applied, for example, by excluding objects that have incorrect length-to-width ratios (a license plate has specific dimensions according to GOST, which means that its length-to-width ratio is also specific). You can also exclude explicitly “crooked rectangles” in which the difference in the lengths of opposite sides is much higher than the level of error. True, it should be remembered that in an attempt to filter out unnecessary objects, you can also throw out the necessary ones to the heap. So you have to be careful here.

However, let’s try. First, let’s write a function for calculating the length of the sides:

def dist(point1,point2):
    d1 = point1[0] - point2[0]
    d2 = point1[1] - point2[1]
    return math.sqrt(d1*d1+d2*d2)

And make changes to the filter:

def my_filter(approx):
    if len(approx)==4:
        if abs(approx[2,0,0]-approx[0,0,0])<10:
            return False
        if abs(approx[2,0,1]-approx[0,0,1])<10:
            return False
        if abs(dist(approx[0,0],approx[1,0])/dist(approx[2,0],approx[3,0])-1)>0.4:
            return False
        if abs(dist(approx[0,0],approx[3,0])/dist(approx[1,0],approx[2,0])-1)>0.4:
            return False
        return True
    return False

Here’s what happened:

As you can see, there are only three objects left. One of them, by the way, can be filtered for divergence with right angles (if the angle strongly deviates from 90 degrees). But we will not do this yet, we will assume that one extra object is not critical.

We visualize the found numbers. To do this, you need to extract points from the contour. Here is how the extraction from the contour of two opposite points of the first element will look like:

x1=res.contours[0][0][0][0]
y1=res.contours[0][0][0][1]
x2=res.contours[0][2][0][0]
y2=res.contours[0][2][0][1]
cv2.rectangle(finish_result,(x1,y1),(x2,y2),(255,0,0),3)

And this is what will be drawn in the end:

Of course, you don’t have to do this. And you need to, well, at least write a function that would extract these points:

def get_rect(countur_item):
    x1 = countur_item[0][0][0]
    y1 = countur_item[0][0][1]
    x2 = countur_item[2][0][0]
    y2 = countur_item[2][0][1]
    return (x1,y1), (x2,y2)

And then we can draw the first shape like this:

p1,p2=get_rect(res.contours[0])
cv2.rectangle(finish_result,p1,p2,(255,0,0),3)

And all the figures are like this:

for item in res.contours:
    p1,p2=get_rect(item)
    cv2.rectangle(finish_result,p1,p2,(255,0,0),3)

And here’s what happens:

That is, now we need to analyze only these three areas, look for letters and numbers there. But first I would like to clean up the code. This is how our run2.py file looks like:

import cv2
import math

from Libraries.Core import Engine
from Libraries.ImageProcessingSteps import MedianBlurProcessingStep, ThresholdProcessingStep, ContourProcessingStep, \
    ContourApproximationProcessingStep

def dist(point1,point2):
    d1 = point1[0] - point2[0]
    d2 = point1[1] - point2[1]
    return math.sqrt(d1*d1+d2*d2)

def my_filter(approx):
    if len(approx)==4:
        if abs(approx[2,0,0]-approx[0,0,0])<10:
            return False
        if abs(approx[2,0,1]-approx[0,0,1])<10:
            return False
        if abs(dist(approx[0,0],approx[1,0])/dist(approx[2,0],approx[3,0])-1)>0.4:
            return False
        if abs(dist(approx[0,0],approx[3,0])/dist(approx[1,0],approx[2,0])-1)>0.4:
            return False
        return True
    return False

def get_rect(countur_item):
    x1 = countur_item[0][0][0]
    y1 = countur_item[0][0][1]
    x2 = countur_item[2][0][0]
    y2 = countur_item[2][0][1]
    return (x1,y1), (x2,y2)

my_photo = cv2.imread('../Photos/6108249.jpg')
#my_photo = cv2.imread('../Photos/car.jpg')
core=Engine()
core.steps.append(MedianBlurProcessingStep(5))
core.steps.append(ThresholdProcessingStep())
core.steps.append(ContourProcessingStep())
#core.steps.append(ContourApproximationProcessingStep(0.02))
core.steps.append(ContourApproximationProcessingStep(0.02,my_filter))
res,history=core.process(my_photo)

i=1
for info in history:
    cv2.imshow('image'+str(i), info.image) # выводим изображение в окно
    i=i+1
cv2.imshow('res', res.image)

finish_result = history[0].image.copy()

for item in res.contours:
    p1,p2=get_rect(item)
    cv2.rectangle(finish_result,p1,p2,(255,0,0),3)
cv2.imshow('Finish', finish_result)


cv2.waitKey()
cv2.destroyAllWindows()

Not very pretty, let’s do some refactoring. Let’s add the Utils.py file to the Libraries folder and move the get_rect and dist functions there. We import these functions:

from Libraries.Utils import dist, get_rect

Now the executable file looks like this:

import cv2

from Libraries.Core import Engine
from Libraries.ImageProcessingSteps import MedianBlurProcessingStep, ThresholdProcessingStep, ContourProcessingStep, \
    ContourApproximationProcessingStep
from Libraries.Utils import dist, get_rect, show_history


def my_filter(approx):
    if len(approx)==4:
        if abs(approx[2,0,0]-approx[0,0,0])<10:
            return False
        if abs(approx[2,0,1]-approx[0,0,1])<10:
            return False
        if abs(dist(approx[0,0],approx[1,0])/dist(approx[2,0],approx[3,0])-1)>0.4:
            return False
        if abs(dist(approx[0,0],approx[3,0])/dist(approx[1,0],approx[2,0])-1)>0.4:
            return False
        return True
    return False


my_photo = cv2.imread('../Photos/6108249.jpg')
#my_photo = cv2.imread('../Photos/car.jpg')
core=Engine()
core.steps.append(MedianBlurProcessingStep(5))
core.steps.append(ThresholdProcessingStep())
core.steps.append(ContourProcessingStep())
#core.steps.append(ContourApproximationProcessingStep(0.02))
core.steps.append(ContourApproximationProcessingStep(0.02,my_filter))
res,history=core.process(my_photo)

show_history(res,history)

finish_result = history[0].image.copy()

for item in res.contours:
    p1, p2 = get_rect(item)
    cv2.rectangle(finish_result, p1, p2, (255, 0, 0), 3)
cv2.imshow('Finish', finish_result)


cv2.waitKey()
cv2.destroyAllWindows()

And the utility file is like this:

import math
import cv2

def get_rect(countur_item):
    x1 = countur_item[0][0][0]
    y1 = countur_item[0][0][1]
    x2 = countur_item[2][0][0]
    y2 = countur_item[2][0][1]
    return (x1,y1), (x2,y2)

def dist(point1,point2):
    d1 = point1[0] - point2[0]
    d2 = point1[1] - point2[1]
    return math.sqrt(d1*d1+d2*d2)

def show_history(res,history):
    i = 1
    for info in history:
        cv2.imshow('image' + str(i), info.image)  # выводим изображение в окно
        i = i + 1
    cv2.imshow('res', res.image)

Now you can think about how to “decrypt” the number. In this lesson, I will tell you how to recognize numbers using a neural network, and we will write the recognizer in the next lesson.

And so, get acquainted, His Majesty Keras. To install it on Windows, first install TensorFlow:

pip3 install tensorflow

and then Keras itself:

pip3 install Keras

Well, actually, an example of a neural network training code on the standard minst dataset built into keras:

from keras import layers
from keras import models

from keras.datasets import mnist
import tensorflow as tf


model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = tf.keras.utils.to_categorical(train_labels)
test_labels = tf.keras.utils.to_categorical(test_labels)

model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

test_loss, test_acc = model.evaluate(test_images, test_labels)
print(test_acc)

Here, a convolutional neural network is used, at the input of which is a black and white picture of 28 by 28 pixels, at the output there is a ten-digit probability vector that this or that number is in the picture. The model shows an accuracy of about 99% on the test sample.

If you do not believe that mnist is really numbers, you can check this by visualizing some element of the dataset:

from keras.datasets import mnist
import cv2

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(train_labels)

cv2.imshow("Цифра", train_images[0])
cv2.waitKey(0)
cv2.destroyAllWindows()

For element number zero, we will see the number 5:

And this element actually corresponds to label 5:

Let’s try to feed some image of the trained neural network. By the way, after we have trained the neural network, it would be nice to save it:

model.save('cats_and_dogs_small_1.h5')

Well, now let’s try to upload a picture with a number and let the neural network recognize it:

import cv2
from keras import models

my_photo = cv2.imread('imgs/Digit0.png',cv2.IMREAD_GRAYSCALE) #загрузим изображение

#приведем изображение к формату для нейросети
normal_photo=my_photo/255.0
input=normal_photo.reshape(1,28,28)

#скормим изображение нейросетке и получим результат
model = models.load_model('mnist_model.bin')
result=model.predict(input)

print(result)

Here is the picture:

At the exit:

[[9.9990845e-01 1.4144711e-08 8.4316625e-08 3.7920216e-11 2.4454723e-06

  4.7663391e-08 8.7873021e-05 4.1903621e-07 3.8488349e-08 6.0560058e-07]]

It can be seen that in the first (that is, zero, counting from zero) cell (corresponds to the number 0), the probability is almost 1, in the rest it is almost 0.

Let’s see how the unit is recognized:

[[5.2682775e-08 9.9998152e-01 9.0230742e-07 1.2926430e-09 9.7749239e-07

  6.3665328e-07 5.2730784e-06 1.0716837e-05 2.0985880e-08 1.3917042e-11]]

As you can see, here I recognized the number correctly.

That’s all, we will put the neural network into a project with a “beautiful” code next time.

Let me remind you that examples can be downloaded here: megabax/CVContainer: It is my pet computer vision project. (github.com)

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *