Exam observation

Hello! As you know, we are a provider of video surveillance at various significant events, including the Unified State Exam.

In this post, we want to talk about our video surveillance and video analytics platforms, how exam monitoring works, what difficulties there are, and how our algorithm helps to detect violations in exams.

Usually special people – observers follow the course of the exam. On the video surveillance portal smotriege.ru, they note the suspicious behavior of the USE participants and submit the detected violations for moderation to Rosobrnadzor. If the moderators believe that the violation really happened, then it is passed on to the examination point for testing (PES). PPE employees check each such appeal and decide what to do with the offender. For example, remove from the exam if he used a phone or cheat sheet.

2020 was no exception, and online observers followed the state attestation. True, this time they had an assistant – a specially trained algorithm. It analyzes the sequence of images that come from video cameras in real time or from archive records, and finds among them possible violations: the use of cheat sheets, a telephone and other devices.

Video analytics technology “watched” the video stream from the classroom online, and between exams – archived videos from offline. In comparison, one observer can watch a maximum of four audiences at the same time, and the algorithm can process videos from more than 2000 audiences in one exam day.

The main goal of such video analytics is to help observers find violations during the exam, pay attention to the suspicious behavior of USE participants during the exam, and eliminate the human factor during the observation process.

How the algorithm works

The technology finds a possible violation, fixes it in its memory and transmits the signal to the video surveillance portal. Observers review the notification in the Potential Violation section. A notification is an excerpt from a video recording, which reflects information about which PES activity was recorded in, its audience. And the part of the screen that you need to pay attention to is highlighted in a red square. After reviewing the passage, the observer decides whether to accept or reject the violation.

Training and difficulties

In order for the algorithm to accurately recognize the behavior of USE participants and correctly record violations, it had to be trained on a large data set. Which we did by collecting videos with already registered violations in exams for 2018-2019.

The learning process consisted of several stages. In the first video, the videos were run through the people detection algorithm using the Yolo neural network. The result was a video with marked areas where people had been for a long time. This was necessary to cut off the teachers who walk along the corridors, etc. Each region with a person was assigned an identifier, and the processed video with marked regions and identifiers was saved.

Then this video was watched by a person who noted as accurately as possible the moments of the beginning and the end of the violation (if there was any, of course), as well as the identifiers of the “violators”. There were also moments of absence of violations as examples of normal behavior, which are also needed for training the algorithm. Thus, we also identified typical violations – the use of cheat sheets and telephones, photographing materials.

The open library helped us a lot OpenPose, which is used to determine the position of people in the frame, their poses and coordinates of key points related to different parts of the body.

In the learning process, we faced a number of difficulties from an algorithmic point of view. First, the quality of the video. From many cameras the picture comes in 320×240 resolution. And in such cases, only people who are directly close to the camera are distinguishable. But the people on the backs of the desk turn into a bunch of pixels, and it becomes very difficult to analyze behavior, especially its small nuances.

Secondly, the angle. Video shooting takes place from different angles: front, back and side. This complicates training the model, and some violations are visible either very poorly or not at all. Thirdly, due to the lack of tagged datasets in the public domain, it became necessary to manually collect and tag data for training the algorithm. And fourth, there is no clear line between violation, suspicious behavior and normal behavior. This complicates manual video marking and algorithm training.

Results and plans

The first version of the algorithm was based on using RandomForest – a classifier trained on the results of OpenPose work. But it had a significant drawback: most of the potentially useful data was simply thrown away. For example, some gestures and movements can be identified by key points, but it is impossible to see that a person has a pen or a cheat sheet in his hand.

Research is currently underway to improve the quality of the algorithm using Human Activity Recognition neural network technologies, SlowFast, I3D and C3D architectures. In addition, we are working to improve the algorithm’s accuracy, performance and usability. We have already expanded the dataset and added data for 2020 – this will help us significantly increase the accuracy of the algorithm.

Expanding the dataset and adding data for 2020 will significantly increase the accuracy of the algorithm
Expanding the dataset and adding data for 2020 will significantly increase the accuracy of the algorithm

Now we are faced with the task of speeding up the algorithm. Which we, too, will most likely tell in the next posts

Similar Posts

Leave a Reply