Video fire detector

Introduction – Why is it needed?

An example of fire detection on video
An example of fire detection on video

Now almost all office, retail and industrial facilities are equipped with video surveillance systems. Video from existing cameras can be used to detect fire, and thereby further enhance the security of the facility.

In some cases, fire detection by a camera can be many times faster than when using standard systems based on fire detectors. And the number of cameras at the facilities now is such that they look into almost every corner.

For example, heat and smoke detectors are located under the ceiling and have a high inertia. Flame detectors in this regard are much faster, but due to their higher cost, they are not as widely used in office and retail premises as in production. Recognizing fire from cameras allows you to detect a fire at an earlier stage and thereby reduce the possible consequences of an emergency.

Main characteristics of the algorithm – Video Fire Detector

– Analysis of a series of frames

– Minimum fire recognition time – from 5 seconds

– The minimum area of ​​the detected fire – 20×30 pixels

– Video stream speed – from 5 frames / sec

– Resolution – 640×480 and higher, the camera is stationary

How to recognize a fire in a video image?

As a rule, the area of ​​fire in the image has a characteristic color and shape, although the shape is not so simple. The color of the fire in the image changes from orange-red to white, in the middle areas of the fire you can see the color gradient in the center of the fire, the color of the fire also depends on the lighting and camera settings (white balance).

Examples of areas of fire
Examples of areas of fire

The shape of the fire can vary greatly from frame to frame. When training a neural network on images with fire, the network is good at learning to recognize a medium-sized fire from about 60×60 pixels, but difficulties in recognizing small areas of fire will arise. On them, the structure of the fire is not very visible, and the network will learn to find small orange-red areas in the image with a similar shape. But there may be other objects in the image with the same color and similar shape: flashers, headlights, glare.

To exclude false objects from already recognized areas, it is worth taking into account the dynamics (changes) of the area on a series of frames, here LSTM networks will help us.

Thus, the following approach is used to recognize fire:

– convolutional network for searching potential areas of fire in the frame by color and shape

– LSTM – a network for analyzing the dynamics of an area on a series of frames and for excluding false objects (flashing lights, headlights, etc.)

Recognition of the area of ​​fire in the frame – YOLOV2

To recognize fire in the frame (transfer learning), a detector based on the YOLOV2 network (backbone network for feature extraction – resnet18) is used. The network was trained both on the meringue of images with fire, and on images with false objects. The resnet18 network was used as a network for feature extraction.

Sample video with fire
Sample video with fire

The base of training videos gradually expanded as the project progressed. During the first iteration of training the detector, it was found that, on the first base, the video network is well trained for finding the area of ​​fire, but poorly resistant to false bright objects with a similar color. This was due to the imbalance of the video base for training – a small number of videos with bright false objects. After that, it was decided to take a walk to the mall and complete the required videos: lights, bulbs, signs, shop windows. And some of the related videos were excluded from the training so that the network does not retrain on them.

When training on video without fire, the script with the training swore at the lack of marked bboxes with fire, so I had to insert fire into each video with fake objects in the video editor. When training YOLOV2, to avoid overfitting, data preprocessing was used – augmentation: random image cropping, changes in brightness and saturation.

Sample video for training
Sample video for training

Initially, YOLOV2, the first input image layer was specified with dimensions of 672×672 pixels, but, as training and testing showed, the detector was uncertain when dealing with small lights, so it was decided to increase the resolution of the input layer to 896×896 pixels. This helped to improve the accuracy for scheduling small lights, but also reduced the performance of the YOLOV2 network, perhaps this issue can still be returned later at the stage of optimizing the algorithm for the speed of operation.

To automate the video marking process, a script has been created – the fire area is marked automatically on each frame in the user-defined ROI area and based on the color mask of the fire received from the application Color Thresholder app in MATLABFor training from video is used only every 5-7 frames, at the output we get a folder with video frames and a mat file with markup: frame number – bbox[x, y, w, h]… The final video base contains 4899 frames from 38 videos.

Part of the images used for training YOLOV2
Part of the images used for training YOLOV2

During training, the learning base was divided in a ratio of 90% – training, 10% – test; on the test, a good detection accuracy of AP = 0.99 is obtained. But small-sized and relatively weak lights were excluded from the training, on which the networks are difficult to learn, and most likely, on them the network can remember the surrounding background, and not the characteristics of the fire themselves, therefore, in practice, the detection accuracy will be lower.

Detection of small areas of fire on video that were not involved in training
Detection of small areas of fire on video that were not involved in training

Video – Testing the recognition of fire in the frame YOLOV2

Testing on our image base revealed cases where YOLOV2 still recognized false objects as fire. For example, in the right frame below there is a fire, and a beacon. It is difficult for even a person to understand from one frame where the fire is and where the flashing light is. Therefore, in the next step, we will add an LSTM network to analyze the dynamics of an area over a series of frames.

      On the right frame, a flasher and a fire (on the right)

On the right frame, a flasher and a fire (on the right)

Video – testing YoloV2 fire recognition in the frame

Analysis of fire dynamics on a series of frames – LSTM network

As shown above, it is sometimes difficult to distinguish fire from non-fire one shot at a time, but looking at a series of shots, this is much easier to do. To analyze the dynamics, we will not analyze each frame from the video, in my opinion, this is an extra computational load, since changes between adjacent frames may be insignificant. Therefore, the analysis of every 5th frame seems to me optimal (perhaps, then the interval can be made even higher). Classic flashing beacons (with an incandescent lamp) have an expression frequency of about 2 Hz (2 revolutions per minute), it is desirable that there are 2 flasher revolutions in a series of frames. We get the following: if we take a series of 30 frames for analysis, then in real video 30 * 5 = 150 frames or 150 / 30fps = 5 seconds of real time will pass, and in 5 seconds there will be exactly two turns of the flasher – this suits us.

Series of frames for multiple lights
Series of frames for multiple lights
Series of frames for flashing beacons
Series of frames for flashing beacons

Since there can be several potential candidates for fire on one frame, the algorithm is as follows: a detector based on YOLOV2 looks for potential bbox fire areas every 5 frames, then a series of 30 frames is accumulated for each bbox. Upon completion of 30 LSTM frames, the network runs through all potential bbox areas, analyzing the dynamics in them and filtering out false objects (flashing lights). (An example of using an LSTM network – Classify Videos Using Deep Learning). To extract features for the LSTM network from the bbox image, we also used resnet18 + additionally the height and width of the bbox area. LSTM network contains 30 hidden layers according to the number of frames in the series.

Defining LSTM Network Layers
Defining LSTM Network Layers

To create data for training the LSTM network, a couple of script files were created, they save a series of 30 images for the found bboxes. The LSTM network was trained on 115 sequences with fire and 144 without fire; the accuracy on the validation data was 96%.

Validation on test (one non-fire is recognized as fire)
Validation on test (one non-fire is recognized as fire)

After the first training on the test base of video with false sources, a couple of flashing lights were still detected as fire, one of these flashing lights is shown in the figure below. Before that, there were no similar flashing lights in the training data, and it was decided to add them to the training database.

Blinker detected as fire after first learning and not detected after
Blinker detected as fire after first learning and not detected after

Video – Testing the video detector on flashing beacons

Algorithm testing

The final video fire detector algorithm consists of two networks: YOLOV2 + LSTM network. Every 5th frame is taken from the video stream for analysis, the frame is resized to the size of the input layer 896×896 pixels, a series of 30 frames is accumulated from the detector output for each bbox (potential area of ​​fire) (30 images from bbox are saved for each object). Next, resnet18 extracts the feature vectors from the series of images that go to the input of the LSTM network. The LSTM network analyzes the dynamics and cuts off false objects from potential areas and gives the final fire / non-fire response.

Video – Testing the video detector on the video of those who did not participate in the training

Application for testing the video fire detector algorithm

To test the algorithm locally on a PC, you can download the Video detector application (we will add the link now …) of the fire – from the product description page, but you also need to download and install the library MATLAB Runtime v9.11

Web version of the Video fire detector application

For the possibility of testing the algorithm by users on the web, we used MATLAB web app server and launched the web version there video fire detector apps

Web version of the Video fire detector application
Web version of the Video fire detector application

But in the web version, the video display does not work as fast as the local version of the application on my laptop. For the sake of optimizing the web version of the application, only every 2nd analyzed frame is displayed, and the displayed picture is reduced to a resolution of 600×600 pixels. Videos uploaded by users are logged for further training of the algorithm. You can upload your video file or immediately click the Run button with a test video.

Further steps of the project development

Then you can try to retrain other types of YoloV3, V4 networks, compare them with the current implementation, work additionally on optimizing the algorithm for speed of operation, see places where you can reduce the input dimensions with a small loss of accuracy (Previously, YoloV3 showed similar results with V2, although it was expected more, more tests need to be done). Further, when typing a base of video with smoke, you can add a smoke recognition algorithm, often situations can arise when there is no open flame yet, but the smoke is already in the frame.

The developed algorithm can be deployed as a software module on a video analytics server. The second option is porting the algorithm to the embedded NVIDIA Jetson platforms, and using it as a complete device – a smart camera “video fire detector”

If specific conditions arise, this algorithm can be modified according to the requirements of the customer, and retrained on new videos with fire. For questions about this project, you can write to fire@exponenta.ru

Links

Description of Video fire detector on our website (adding …)

Useful information on the topic, neural networks https://exponenta.ru/ai

Computer vision https://exponenta.ru/comp-vision

Our YouTube channel https://www.youtube.com/user/MATLABinRussia

I’m on www.linkedin.com/in/alexn-vorobyev

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *