Image novelty detection with Python and the scikit-learn library

In this article, I will tell you how to use the scikit-learn, opencv, numpy, imutilsc libraries to detect the novelty of input images. Many programs require the ability to decide whether a new object belongs to the same distribution as existing objects (this is an intermediate result), or whether it should be treated as novelty. Often this feature is used to clean up real datasets.
First, let’s install the necessary libraries:

pip install numpy
pip install opencv-contrib-python
pip install imutils
pip install scikit-learn

We will identify novelties using the “Novelty Detection” method using the Isolation Forests module.

One effective way to detect novelty in a dataset is to use random forests. The algorithm isolates observations by randomly choosing a feature and then randomly choosing a separation value between the maximum and minimum values ​​of the selected feature. The strategy is shown above.
Consider the operation of the algorithm on a specific example. Let us face the task of determining whether the picture depicts the sea.
Let’s train the model on a dataset of photographs of the sea.

When presented with a new input image, our novelty detection algorithm must decide whether the new image fits into the “sea variety” or is new and return either 1 or -1. If 1 is returned, then we conclude “Yes, this is the sea”, otherwise “No, it does not look like the sea”
To evaluate our novelty detection algorithm, we use 3 test images:

First, we will implement a module for extracting a color histogram using OpenCV, this is necessary in order to represent an image in a pixel plot:

from imutils import paths
import numpy as np
import cv2
def histogram_image(image, bins=(4, 6, 3)):
	# вычислияем 3D-цветовую гистограмму по изображению и нормализуем ее
	histogram = cv2.calcHist([image], [0, 1, 2], None, bins,
		[0, 180, 0, 256, 0, 256])
	histogram = cv2.normalize(histogram, histogram).flatten()
	# Возращаем гистограмму
	return histogram

Then we load the dataset:

def loading_dataset(path_dataset, bins):
# пути ко всем изображениям в нашем каталоге набора данных, затем
# инициализируйте наши списки изображений
    path_s_image = list(paths.list_images(path_dataset))
    data = []
    # цикл по каждому патчу
    for path in path_s_image:
        image = cv2.imread(path)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
        features = quantify_image(image, bins)
        data.append(features)
    return np.array(data)

We create a Python file for training the model on the loaded dataset and start execution:

from function import loading_dataset
from sklearn.ensemble import IsolationForest
import pickle
print("[ИНФО] подготовка набора данных")
data = loading_dataset('sea/', bins=(3, 3, 3))
# обучаем модель
print("[INFO] модель обнаружения")
model = IsolationForest(n_estimators=100, contamination=0.01,
	random_state=42)
model.fit(data)
f = open('detect_anomaly.model', "wb")
f.write(pickle.dumps(model))
f.close()

Create a test_anomaly file to test the model:

from function import histogram_image
import pickle
import cv2
print("[ИНФО] загрузка модели новизны")
model = pickle.loads(open("detect_anomaly.model", "rb").read())
# Загрузка изображения,и конвертация в гистограмму
image = cv2.imread('examplescities.jpg')
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
features = histogram_image(hsv, bins=(3, 3, 3))
preds = model.predict([features])[0]
label = "new" if preds == -1 else "normal"
color = (0, 0, 255) if preds == -1 else (0, 255, 0)
# рисуем фотографию
cv2.putText(image, label, (10,  25), cv2.FONT_HERSHEY_SIMPLEX,
	0.7, color, 2)
# отображаем на экране
cv2.imshow("Output", image)
cv2.waitKey(0)

We test the model on three photos – the city, the sea and the highway.
Test 1. We load a photo of the city into the model.

According to the result of the algorithm development, we see that the model marked the photo of the city as a novelty.
Test 2. We load a photo of the sea into the model.

The model marked the photo of the sea as normal. That is, the picture shows the sea.
Test 3. We load a photo of the highway into the model.

The model marked the photo of the highway as new.
As a result, our trained model passed three of the three tests, and identified all the photos correctly. Two of which were new and one photo was not new.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *