Recommender system through similar image search with Resnet50

Briefly about recommender systems

Globally, there are two approaches to creating recommender systems. Content-oriented and collaborative filtering. The underlying assumption of the collaborative filtering approach is that if A and B buy similar products, A is more likely to buy the product that B bought than the product that a random person bought. Unlike the content-oriented approach, there are no features corresponding to users or objects. The recommender system is based on a matrix of user interactions. Content-oriented system is based on knowledge about subjects. For example, if a user is looking at silk t-shirts, they might be interested in looking at other silk t-shirts.

In this article, I want to talk about an approach that is based on the search for similar images. Why prepare additional data when almost all the main characteristics of some products, such as clothing, can be displayed in the image.

softmax exception
softmax exception

The essence of the approach is to extract features from product images. With the help of a convolutional network, in my example I used Resnet50, since the resnet feature vector has a relatively small dimension. Extracting a feature vector using a trained network is very simple. You just need to exclude the softmax classifier, it determines which class the image belongs to, and we will get a feature vector at the output. Next, you need to compare vectors and look for similar ones. The more similar the images, the smaller the Euclidean distance between the vectors.

Code and dataset

The dataset can be downloaded from here https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small

Initialization of the trained restnet50 from the pytorch library and feature extraction from the dataset

from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights
import torch
import glob
import pickle
from tqdm import tqdm
from PIL import Image

def pil_loader(path):
    # Некоторые изображения из датасета представленны не в RGB формате, необходимо их конверитровать в RGB
    with open(path, 'rb') as f:
        img = Image.open(f)
        return img.convert('RGB')


# Инициализация модели обученой на датасете imagenet
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
preprocess = weights.transforms()

use_precomputed_embeddings = True
emb_filename="fashion_images_embs.pickle"
if use_precomputed_embeddings: 
    with open(emb_filename, 'rb') as fIn:
        img_names, img_emb_tensors = pickle.load(fIn)  
    print("Images:", len(img_names))
else:
    img_names  = list(glob.glob('images/*.jpg'))
    img_emb = []
    # извлечение признаков из изображений в датасете. У меня на CPU заняло около часа
    for image in tqdm(img_names):
        img_emb.append(
            model(preprocess(pil_loader(image)).unsqueeze(0)).squeeze(0).detach().numpy()
        )
    img_emb_tensors = torch.tensor(img_emb)
    
    with open(emb_filename, 'wb') as handle:
        pickle.dump([img_names, img_emb_tensors], handle, protocol=pickle.HIGHEST_PROTOCOL)

A function that creates a search index using faiss and reduces the dimension of feature vectors

# Для сравнения векторов используется faiss
import faiss                   
from sklearn.decomposition import PCA

def build_compressed_index(n_features):
    pca = PCA(n_components=n_features)
    pca.fit(img_emb_tensors)
    compressed_features = pca.transform(img_emb_tensors)
    dataset = np.float32(compressed_features)
    d = dataset.shape[1]
    nb = dataset.shape[0]
    xb = dataset

    index_compressed = faiss.IndexFlatL2(d)
    index_compressed.add(xb)
    return [pca, index_compressed]

Helpers for displaying results

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def main_image(img_path, desc):
    plt.imshow(mpimg.imread(img_path))
    plt.xlabel(img_path.split('.')[0] + '_Original Image',fontsize=12)
    plt.title(desc,fontsize=20)
    plt.show()

def similar_images(indices, suptitle):
    plt.figure(figsize=(15,10), facecolor="white")
    plotnumber = 1    
    for index in indices[0:4]:
        if plotnumber<=len(indices) :
            ax = plt.subplot(2,2,plotnumber)
            plt.imshow(mpimg.imread(img_names[index]))
            plt.xlabel(img_names[index],fontsize=12)
            plotnumber+=1
    plt.suptitle(suptitle,fontsize=15)
    plt.tight_layout()

The search function itself. Takes as input the number of features so that you can experiment with a sufficient number of features

import numpy as np
# поиск, можно искать по индексу из предварительно извлеченных изображений или передать новое изображение
def search(query, factors):
    if(type(query) == str):
        img_path = query
    else:
        img_path = img_names[query]
    one_img_emb = torch.tensor(model(preprocess(read_image(img_path)).unsqueeze(0)).squeeze(0).detach().numpy())
    main_image(img_path, 'Query')
    compressor, index_compressed = build_compressed_index(factors)
    D, I = index_compressed.search(np.float32(compressor.transform([one_img_emb.detach().numpy()])),5)
    similar_images(I[0][1:], "faiss compressed " + str(factors))

Hero of the occasion. Call search

search(100,300)
search("t-shirt.jpg", 500)

conclusions

As a result, in a couple of hours, you can assemble a fairly high-quality recommender system based on the similarity of images, which is enough for some cases. Images do not require preliminary preparation, markup and some kind of meta-information, which greatly simplifies the process.

To improve the quality of recommendations, you can retrain some layers of the network on the dataset used.

Similar Posts

Leave a Reply