Recognition, storage and search of faces in the database

In this article, I will explain as briefly and simply as possible the principle of recognizing, storing and searching faces in a database. The Insightface library and PostgreSQL database will be used as an example.

There are quite a few articles on the Internet about facial recognition, but to understand how to implement a database search for faces in your project, you will have to read more than one article. Therefore, I decided to write my own material and, I hope, it will help people save several hours by getting a complete understanding of the design of a face search device in a database from just one article.

First, let’s briefly go through the entire chain of actions to understand the general scheme:

  1. We run photos with faces through the insightface library and get a vector (embedding) for each face

  2. The resulting vector is written to the database

  3. To search by face, we compare the original embedding with those stored in the database

Now let's go through each point in more detail.

Converting a photo of a face into a vector

First, install the library for face recognition:

pip install insightface

There are many other libraries for face recognition (for example, DeepFace), you can use any library, the principle of operation will not change.

Next, we run the image through the neural network:

from insightface.app import FaceAnalysis
import cv2
app = FaceAnalysis(name="buffalo_sc",providers=['CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(256, 256))  #подготовка нейросети
img = cv2.imread("G:/pic.jpg") #считываем изображение
faces = app.get(img) #ищем лица на изображении и получаем информацию о них
for face in faces:
    print(face)
Hidden text
  • if you need higher accuracy, gender, age recognition, then use the “buffalo_l” model

  • you can use GPU instead of CPU (providers=”CUDAExecutionProvider”). You will also have to install the onnx library and onnxruntime-gpu

At the output in the face variable we get:

  • face.bbox is the area in the picture where the face is located

  • face.det_score – the neural network’s confidence in the results obtained

  • face.embedding – a point, or a vector in a 512-dimensional space, which can then be used to compare the similarity of faces

In order to see which faces were found, you can use this code:

x, y, x2, y2 = face.bbox #получаем границы лица
cropped = img[int(y):int(y2), int(x):int(x2)] #вырезаем лицо из изображения
cv2.imshow('image', cropped) #показываем лицо
cv2.waitKey(0)

Original image

Received faces.  insihtface find faces, even if part of them is hidden by a mask

Received faces. insihtface find faces, even if part of them is hidden by a mask

Storing vectors in a database

There are many options for storing vectors, but if there are tens of thousands of individuals, or even millions, then you cannot do without a good database. In this article, as an example, I will show how vector storage works in my project (django + postgresql)

For easier storage and search of vectors, you will need to install the “pgvector” database extension https://github.com/pgvector/pgvectorand also, install the python library pip install pgvector

After installing the extension, you can create a table in which we will store the resulting vectors.

This is what django models.py looks like:

from pgvector.django import VectorField
class Faces(models.Model):
    id = models.AutoField(primary_key=True)
    embedding = VectorField(dimensions = 512,null=True)

This is what it looks like in pgAdmin:

How to add new faces to the database in django:

Faces.objects.create(embedding=face.embedding)

Once we already have a database with the persons stored in it, we can proceed to the search.

Database Search

So, our faces are stored as vectors. What do we need to do with these vectors to find a face in the database? The most common search options are finding the distances between the ends of the vectors (the smaller the distance, the less different the faces are), and also finding the cosine between the vectors.

Distance between ends of vectors in 2-dimensional space

Distance between ends of vectors in 2-dimensional space

In this article we will look at searching through finding distances (Euclidean distance). It is done like this:

from pgvector.django import L2Distance
fbase = Faces.objects.alias(distance=L2Distance('embedding', face.embedding)).filter(distance__lt=22)

In this piece of code, we search the database for vectors whose distance from the original vector is no more than 22. If you need a more precise search, you can use a lower number. If accuracy is not that important, you can increase the number. Also, the distance depends on the dimension of the vector; if you have 128-dimensional vectors, then the distances there will most likely be smaller. In general, select the search accuracy empirically based on your tasks.

If we just need to calculate the distance between two vectors, then we can use the numpy library:

import numpy as np
distance = np.linalg.norm(embedding1 - embedding2)

In the example given, only id is stored in the table with faces. You can store any other data, gender, age, you can link id with data from other tables. It all depends on your tasks.

Conclusion

In this article, we examined the general principles of recognition, storage and search of faces in a database. Next, select databases and libraries that will be most convenient and effective for working in your frameworks to solve your problems.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *