Neural networks in a big city. We figure out how they help to identify people, and launch our own neural network

Algorithms for face detection have become part of our life, although not everyone notices it. It all started in 2015 with the entertainment industry. Looksery, an AR filter startup, was bought by Snapchat. The application recognized the face of the person in the photo and put funny faces on it. A little later, in early 2016, Facebook bought the Belarusian startup MSQRD and launched masks on Facebook Stories. But this can only be considered a test run of such technologies.

In this article, you can read how identification systems are used, learn about the weaknesses of computer algorithms, and also try to run a neural network for face detection and identification on your own computer.

Before continuing, so as not to get confused, let’s define some terms.

Detection – detection of any class of objects, for example, a face.

Identification – detection of a certain object, for example, the face of Vasya Pupkin.

The further development of neural networks has raised the accuracy of identification, sufficient to solve problems such as protection and security. In 2017, Apple introduced FaceID, a scanner that allows you to unlock your phone by the owner’s face. There is a new way to pay for purchases using biometrics. In the summer of 2020, a network of Moscow cafes Prime began testing a customer’s face payment system. Instead of using passes for turnstiles, they begin to identify a person through a video camera.

Paying attention to the number of cameras installed in large cities, you can realize that the topic is important and relevant. The governments of the largest countries are interested in automation and are investing huge amounts of money in the implementation of such projects.

Recognition using algorithms

In 2001, Paul Viola and Michael Jones proposed an object detection method that has been widely used to identify faces. In 2005, Navnit Dalal and Bill Triggs showed their Histogram of Oriented Gradients (HOG), which could also detect faces.

Unfortunately, the quality of these approaches left much to be desired. The level of triggering errors of the first and second kind was high.

An error of the first kind is a rejection of the correct hypothesis, when a face in the frame is not detected.
An error of the second kind is the adoption of an incorrect hypothesis when something is detected that does not apply to a person.

The algorithms did not work well in low light, tilting or turning the head. There was no significant difference in accuracy between the two methods.

High detection accuracy was obtained using neural networks. Neural networks, unlike standard algorithms, are capable of detecting faces under various conditions:

  • Poor illumination level (there is enough light from a monitor in a dark room for recognition);
  • The head is tilted or slightly turned to the side;
  • The face is not fully included in the frame or is partially covered by the palm;
  • Beard, glasses – no problem.

So those who are still using solutions based on old algorithms, drop it! For a long time, there have been solutions that require the same processing power, but at the same time give an accuracy close to 100%.

Face recognition in cities

As of 2019, the number of cameras with face recognition systems installed in major cities:

  • Beijing – 470 thousand;
  • Great Britain – 420,000 thousand;
  • Washington – 30,000 thousand;

There are currently about 193,000 HD cameras installed in Moscow. The location of the cameras can be viewed on the website

Also, a face recognition system has been launched in the Moscow metro. Cameras are installed in carriages and at turnstiles. In 2021, the function of fare collection through face ID will be added. When a person passes through the turnstile, the payment will be debited automatically. To do this, you just need to associate a bank card with biometric data.

The goals of deploying recognition systems

In theory, these measures should reduce crime by identifying perpetrators in a split second. The tracking system will identify the most dangerous places in the city and identify illegal migrants. According to the Ministry of Internal Affairs, in two years of test operation of the face recognition system, about 100 people were found on the federal wanted list, after which at the end of 2019 it was decided to connect all city cameras to the system.

The effectiveness of the new tracking system lies in the fact that if earlier all materials were saved in a separate video archive and police officers had to run each video through a special program, now face recognition occurs simultaneously from several thousand cameras in real time.

Unfortunately, no system is perfect – sometimes data leaks occur. On the Internet, for a modest fee, such information is sold to everyone. Most often, such data is bought by private detectives or collectors.

How such data is processed

A number of companies have been actively developing human face recognition systems for over 40 years. Among them even the famous weapons manufacturer Smith & Wesson with its ASID – Automated Suspect Identification System. And the police in London now is testing a similar system in cooperation with the Japanese company NEC. But the leader in this area can be safely called the Russian company NtechLab. In 2015, the NtechLab face recognition algorithm was recognized as the best at the international competition organized by the University of Washington The MegaFace Benchmark… In May 2016, NtechLab, one of three Russian companies, was admitted to the official testing of biometrics technologies, conducted by NIST… The very fact of admission to the tests gave the company the right to participate in the state tenders of the United States and a number of other countries.

The algorithm was presented to a wide audience in the form of the FindFace service, which searched for people on Vkontakte by photograph. After its launch, the service made a lot of noise on social networks, and also provoked several scandals with deanon. Apparently, the service was a marketing technique designed to show the platform’s capabilities to potential technology buyers. Soon after, the developers closed the service, and the company began providing services to the government and various business sectors. In particular, it became known that the Moscow mayor’s office paid NtechLab at least $ 3.2 million for the use of its face recognition technology in the city video surveillance system.

Own home tracking system based on neural networks

MTCNN – face detection neural network

MTCNN is a cascade of convolutional neural networks. The model uses 3 networks: P-Net, R-Net and O-net. Each subsequent neural network increases the prediction accuracy.

The first P-Net outputs the coordinates of the bounding rectangles of the intended faces. The R-net then cuts off less likely face areas and adds a level of confidence to those that remain. In the third mesh, we again get rid of the lower confidence rectangles and add the coordinates of the 5 facial landmarks.

The result of the mtcnn work

For those who want to experiment, the neural network is packaged in a Python library of the same name. MTCNN… To start, it is enough to create an MTCNN object and call the detect_face method.

The detector returns a dictionary with three keys: rectangle coordinates, key points, and confidence level. Sample code from the project github below:

import cv2
from mtcnn import MTCNN
detector = MTCNN()
image = cv2.cvtColor(cv2.imread("ivan.jpg"), cv2.COLOR_BGR2RGB)
result = detector.detect_faces(image)
bounding_box = result[0]['box']
keypoints = result[0]['keypoints']
          	(bounding_box[0], bounding_box[1]),
       	   (bounding_box[0]+bounding_box[2], bounding_box[1] + bounding_box[3]),
          	(0,155,255), 2),(keypoints['left_eye']), 2, (0,155,255), 2),(keypoints['right_eye']), 2, (0,155,255), 2),(keypoints['nose']), 2, (0,155,255), 2),(keypoints['mouth_left']), 2, (0,155,255), 2),(keypoints['mouth_right']), 2, (0,155,255), 2)
cv2.imwrite("ivan_drawn.jpg", cv2.cvtColor(image, cv2.COLOR_RGB2BGR))

I was having problems with package dependencies. If you get errors, copy the following packages into the requirements.txt file:

numpy == 1.16

If tensorflow is not installed then update setuptools with the command:

pip install setuptools --upgrade --ignore-installed


Siamese neural network that identifies faces. The word “Siamese” means that it is made up of two identical neural networks with the same weights. During training, FaceNet extracts facial features and transforms them into Euclidean space, where the distances between the points of the vector directly correspond to the degree of similarity of the faces. Comparing two images from two networks during training, the coefficients are changed so as to increase the Euclidean distance if different people are depicted, and to minimize it when the same person is depicted.

After training, the neural network is able to perform identification by comparing the current face with the faces stored in the database.

In order to carry out identification, you first need to carry out detection using any of the methods for identifying faces. After obtaining an image of a face, in which the eyes and lips are located approximately in the same place of the image, we transfer the image to FaceNet, having previously converted it to a resolution of 96×96 pixels.

Then the image is converted to Euclidean space and compared with data from the face database. If the distance is less than a predefined threshold, the network will signal a match.

How faceNet works

Anyone who wants to test the model can take a look at this repository… Weights for the pretrained neural network can be downloaded from google drive

These are standard implementations of detection algorithms. At the level of security systems, a more subtle and complex setting is used. Of course, this is a trade secret.

Weaknesses of such systems

In parallel with the development of tracking systems, methods of deanonymization are being developed. Glasses with LEDs, special pictograms that are not typical for a human face, or ordinary masks are used to protect against surveillance.

Factors such as weather conditions, camera placement, lens direction, light level, and partial covering of the face with a scarf or medical mask affect the recognition accuracy in urban environments.

There are cases when the algorithm makes a mistake and takes a respectable person for an offender who is on the wanted list. The police detain the person until they are sure that the algorithm was wrong.

Prospects for the development of video surveillance systems

Currently, systems are being developed that determine specific postures of a person, for example, to detect a fight. They learn to identify people by gait (it is individual for each person) and create a biobank for storing such biometric data.

In the near future, video surveillance systems will receive a significant expansion of the scope and this is not necessarily security. For example, one of the most promising areas is retail.


Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *