Recognition, storage and search of faces in the database
In this article, I will explain as briefly and simply as possible the principle of recognizing, storing and searching faces in a database. The Insightface library and PostgreSQL database will be used as an example.
There are quite a few articles on the Internet about facial recognition, but to understand how to implement a database search for faces in your project, you will have to read more than one article. Therefore, I decided to write my own material and, I hope, it will help people save several hours by getting a complete understanding of the design of a face search device in a database from just one article.
First, let’s briefly go through the entire chain of actions to understand the general scheme:
We run photos with faces through the insightface library and get a vector (embedding) for each face
The resulting vector is written to the database
To search by face, we compare the original embedding with those stored in the database
Now let's go through each point in more detail.
Converting a photo of a face into a vector
First, install the library for face recognition:
pip install insightface
There are many other libraries for face recognition (for example, DeepFace), you can use any library, the principle of operation will not change.
Next, we run the image through the neural network:
from insightface.app import FaceAnalysis
import cv2
app = FaceAnalysis(name="buffalo_sc",providers=['CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(256, 256)) #подготовка нейросети
img = cv2.imread("G:/pic.jpg") #считываем изображение
faces = app.get(img) #ищем лица на изображении и получаем информацию о них
for face in faces:
print(face)
Hidden text
if you need higher accuracy, gender, age recognition, then use the “buffalo_l” model
you can use GPU instead of CPU (providers=”CUDAExecutionProvider”). You will also have to install the onnx library and onnxruntime-gpu
At the output in the face variable we get:
face.bbox is the area in the picture where the face is located
face.det_score – the neural network’s confidence in the results obtained
face.embedding – a point, or a vector in a 512-dimensional space, which can then be used to compare the similarity of faces
In order to see which faces were found, you can use this code:
x, y, x2, y2 = face.bbox #получаем границы лица
cropped = img[int(y):int(y2), int(x):int(x2)] #вырезаем лицо из изображения
cv2.imshow('image', cropped) #показываем лицо
cv2.waitKey(0)
Storing vectors in a database
There are many options for storing vectors, but if there are tens of thousands of individuals, or even millions, then you cannot do without a good database. In this article, as an example, I will show how vector storage works in my project (django + postgresql)
For easier storage and search of vectors, you will need to install the “pgvector” database extension https://github.com/pgvector/pgvectorand also, install the python library pip install pgvector
After installing the extension, you can create a table in which we will store the resulting vectors.
This is what django models.py looks like:
from pgvector.django import VectorField
class Faces(models.Model):
id = models.AutoField(primary_key=True)
embedding = VectorField(dimensions = 512,null=True)
This is what it looks like in pgAdmin:
How to add new faces to the database in django:
Faces.objects.create(embedding=face.embedding)
Once we already have a database with the persons stored in it, we can proceed to the search.
Database Search
So, our faces are stored as vectors. What do we need to do with these vectors to find a face in the database? The most common search options are finding the distances between the ends of the vectors (the smaller the distance, the less different the faces are), and also finding the cosine between the vectors.
In this article we will look at searching through finding distances (Euclidean distance). It is done like this:
from pgvector.django import L2Distance
fbase = Faces.objects.alias(distance=L2Distance('embedding', face.embedding)).filter(distance__lt=22)
In this piece of code, we search the database for vectors whose distance from the original vector is no more than 22. If you need a more precise search, you can use a lower number. If accuracy is not that important, you can increase the number. Also, the distance depends on the dimension of the vector; if you have 128-dimensional vectors, then the distances there will most likely be smaller. In general, select the search accuracy empirically based on your tasks.
If we just need to calculate the distance between two vectors, then we can use the numpy library:
import numpy as np
distance = np.linalg.norm(embedding1 - embedding2)
In the example given, only id is stored in the table with faces. You can store any other data, gender, age, you can link id with data from other tables. It all depends on your tasks.
Conclusion
In this article, we examined the general principles of recognition, storage and search of faces in a database. Next, select databases and libraries that will be most convenient and effective for working in your frameworks to solve your problems.