Recommender system through similar image search with Resnet50
Briefly about recommender systems
Globally, there are two approaches to creating recommender systems. Content-oriented and collaborative filtering. The underlying assumption of the collaborative filtering approach is that if A and B buy similar products, A is more likely to buy the product that B bought than the product that a random person bought. Unlike the content-oriented approach, there are no features corresponding to users or objects. The recommender system is based on a matrix of user interactions. Content-oriented system is based on knowledge about subjects. For example, if a user is looking at silk t-shirts, they might be interested in looking at other silk t-shirts.
In this article, I want to talk about an approach that is based on the search for similar images. Why prepare additional data when almost all the main characteristics of some products, such as clothing, can be displayed in the image.

The essence of the approach is to extract features from product images. With the help of a convolutional network, in my example I used Resnet50, since the resnet feature vector has a relatively small dimension. Extracting a feature vector using a trained network is very simple. You just need to exclude the softmax classifier, it determines which class the image belongs to, and we will get a feature vector at the output. Next, you need to compare vectors and look for similar ones. The more similar the images, the smaller the Euclidean distance between the vectors.

Code and dataset
The dataset can be downloaded from here https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-small
Initialization of the trained restnet50 from the pytorch library and feature extraction from the dataset
from torchvision.io import read_image
from torchvision.models import resnet50, ResNet50_Weights
import torch
import glob
import pickle
from tqdm import tqdm
from PIL import Image
def pil_loader(path):
# Некоторые изображения из датасета представленны не в RGB формате, необходимо их конверитровать в RGB
with open(path, 'rb') as f:
img = Image.open(f)
return img.convert('RGB')
# Инициализация модели обученой на датасете imagenet
weights = ResNet50_Weights.DEFAULT
model = resnet50(weights=weights)
model.eval()
preprocess = weights.transforms()
use_precomputed_embeddings = True
emb_filename="fashion_images_embs.pickle"
if use_precomputed_embeddings:
with open(emb_filename, 'rb') as fIn:
img_names, img_emb_tensors = pickle.load(fIn)
print("Images:", len(img_names))
else:
img_names = list(glob.glob('images/*.jpg'))
img_emb = []
# извлечение признаков из изображений в датасете. У меня на CPU заняло около часа
for image in tqdm(img_names):
img_emb.append(
model(preprocess(pil_loader(image)).unsqueeze(0)).squeeze(0).detach().numpy()
)
img_emb_tensors = torch.tensor(img_emb)
with open(emb_filename, 'wb') as handle:
pickle.dump([img_names, img_emb_tensors], handle, protocol=pickle.HIGHEST_PROTOCOL)
A function that creates a search index using faiss and reduces the dimension of feature vectors
# Для сравнения векторов используется faiss
import faiss
from sklearn.decomposition import PCA
def build_compressed_index(n_features):
pca = PCA(n_components=n_features)
pca.fit(img_emb_tensors)
compressed_features = pca.transform(img_emb_tensors)
dataset = np.float32(compressed_features)
d = dataset.shape[1]
nb = dataset.shape[0]
xb = dataset
index_compressed = faiss.IndexFlatL2(d)
index_compressed.add(xb)
return [pca, index_compressed]
Helpers for displaying results
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
def main_image(img_path, desc):
plt.imshow(mpimg.imread(img_path))
plt.xlabel(img_path.split('.')[0] + '_Original Image',fontsize=12)
plt.title(desc,fontsize=20)
plt.show()
def similar_images(indices, suptitle):
plt.figure(figsize=(15,10), facecolor="white")
plotnumber = 1
for index in indices[0:4]:
if plotnumber<=len(indices) :
ax = plt.subplot(2,2,plotnumber)
plt.imshow(mpimg.imread(img_names[index]))
plt.xlabel(img_names[index],fontsize=12)
plotnumber+=1
plt.suptitle(suptitle,fontsize=15)
plt.tight_layout()
The search function itself. Takes as input the number of features so that you can experiment with a sufficient number of features
import numpy as np
# поиск, можно искать по индексу из предварительно извлеченных изображений или передать новое изображение
def search(query, factors):
if(type(query) == str):
img_path = query
else:
img_path = img_names[query]
one_img_emb = torch.tensor(model(preprocess(read_image(img_path)).unsqueeze(0)).squeeze(0).detach().numpy())
main_image(img_path, 'Query')
compressor, index_compressed = build_compressed_index(factors)
D, I = index_compressed.search(np.float32(compressor.transform([one_img_emb.detach().numpy()])),5)
similar_images(I[0][1:], "faiss compressed " + str(factors))
Hero of the occasion. Call search
search(100,300)
search("t-shirt.jpg", 500)
conclusions
As a result, in a couple of hours, you can assemble a fairly high-quality recommender system based on the similarity of images, which is enough for some cases. Images do not require preliminary preparation, markup and some kind of meta-information, which greatly simplifies the process.
To improve the quality of recommendations, you can retrain some layers of the network on the dataset used.