Face recognition with InsightFace or how CatBoost guessed names

The purpose of the article is to talk about a simple and at the same time working option for creating a face recognition system using only out-of-the-box models, namely, the InsightFace library for image preprocessing and Catboost for their classification.

Before proceeding, let’s divide the task into stages.


1) Finding faces in images in the training dataset

2) Transfer of the withdrawn images with faces into vector format

3) Training the classification model on the resulting data

4) Face classification on images unknown for the model (test set)


First, let’s install the necessary libraries.

pip install -U insightface
pip install onnx 
pip install onnxruntime

To run InsightFace models on the GPU, you need to install onnx and onnxruntime. Onnx(Open Neural Network Exchange) is an open source model development library
and quick exchange between developers You can see the compatibility of onnx and CUDA versions in the table here.

Here is the complete list of libraries I have used

import os
import pickle
from PIL import Image
import numpy as np
from typing import List
import onnxruntime as ort
from insightface.app import FaceAnalysis
from catboost import CatBoostClassifier
import shutil


To test the algorithm, I used a dataset called Real World Fasked Face Recognition Dataset (RMFRD). This data contains 5,000 faces of 525 masked and 90,000 unmasked faces. All images are in .jpg format. In this example, I will only use images of people with an open face.

We will immediately divide the data into training and test data.

def get_test_data(dir):
    for subdir in os.listdir(dir):
        path = dir + subdir + '/'
        if len(os.listdir(path))>1:
            filenames = [filename for filename in os.listdir(path) ]
            face_path = path + filenames[0]
            shutil.move(face_path, 'dataset/test_faces/')  

Function get_test_data()takes one image from each directory, provided that it is not the only one there.

Getting embeddings

The InsightFace library already contains pre-trained models. One of these models is buffalo_l. It will suit us both for detecting a face in an image, and for finding embeddings.

app = FaceAnalysis(name="buffalo_l",providers=['CUDAExecutionProvider'])
app.prepare(ctx_id=0, det_size=(256, 256))

Now let’s preprocess the data to feed it into the classification model. The following functions will help us with this.

def extract_face(filename, required_size=(256, 256)):
    image = Image.open(filename)    
    img_arr = np.array(image)  
    im = Image.fromarray((img_arr))
    im = im.resize(required_size)
    rgb_arr = np.array(im.convert('RGB'))   
    emb_res = app.get(rgb_arr)
      face_array = emb_res[0].embedding
      return face_array     
      print('no embedding found for this image')

def load_face(dir):
    faces = list()
    for filename in os.listdir(dir):
        path = dir + filename
        face = extract_face(path)
    return faces

def load_dataset(dir):
  X, y = list(), list()
  i = 1
  for subdir in os.listdir(dir):
      path = dir + subdir + '/'
      faces = load_face(path)
      labels = [subdir for i in range(len(faces))]
      print("loaded %d sample for class: %s" % (len(faces),subdir) ) 
  return np.array(X), np.array(y)

Let’s go through all the data in our data set and get embedding for images where the model can detect a face. By using load_dataset() go to each directory, and then load_face() refers to each image,
extract_face() returns us the required vector of values ​​corresponding to the face in that image. In the event that the neural network cannot expose the face image, the function extract_face() will give a message:

no embedding found for this imag.

After processing all the photos in the directory, a message will be displayed according to the type

loaded 137 sample for class: linjunjie

At the output, we get two arrays of the same size with image emaddings and labels (the names of people in the photo, which are the predictive factor).

def filter_empty_embs(img_set: np.array, img_labels: List[str]):
    good_idx = [i for i,x in enumerate(img_set) if x is not None]
    clean_labels = img_labels[good_idx]
    clean_embs = img_set[good_idx]      
    return clean_embs, clean_labels

Function filter_empty_embs() will remove empty label values ​​for images on which faces were not detected and, consequently, embeddings were received.

Let’s run our image transformation functions

trainX, trainy = load_dataset('dataset/unmasked_users/')

assert len(trainX) == len(trainy)
train_emb, train_labels = filter_empty_embs(trainX, trainy)

assert len(train_emb) == len(train_labels)
print("Train_X size is {} , train_y size is {} ".format(train_emb.shape, train_labels.shape))

Classification model

We will use the classifier from the library catboost. The set of gradient boosting models that it contains has proven itself well and is used to solve real problems.

Define the classifier model

clf_model = CatBoostClassifier(iterations=100,

You can play around with the classifier parameters and see how they affect the performance of the model. The selection of parameters can serve as a topic for a separate article. More about catboost can be read here.

Let’s train our model on embeddings obtained from the training set train_emb and names of people stored in an array train_labels.


Model validation

After training the model, we will check it on the data that she has not yet seen in the set test_faces.

preds = []
true_labels = []
for filename in os.listdir('dataset/test_faces/'):
    image = Image.open('dataset/test_faces/'+filename)        
    img_arr = np.array(image)  
    im = Image.fromarray((img_arr))
    required_size=(256, 256)
    im = im.resize(required_size)
    rgb_arr = np.array(im.convert('RGB'))   
    emb_res = app.get(rgb_arr)
      face_array = emb_res[0].embedding   
      print('no embedding found for this image')

    predict = clf_model.predict(face_array)

    max_proba = clf_model.predict_proba(img_emb).max()
    if predict[0] in filename:

As a metric for the quality of the model, we will use the simplest – accuracy.

from sklearn.metrics import accuracy_score

print(accuracy_score(true_labels, preds))

The output is 0.8958333333333334 which is pretty good.


I hope this article will be useful to those who are just starting to get acquainted with the field of computer vision. I showed a simple way to create a face recognition system that gives a fairly high level of quality. But, as you know, there is no limit to perfection, and even more so in this direction.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *