Using perceptual hashes to speed up the search for frames in the “VideoColor” database

Introductory part

What are perceptual hashes?

Perceptual hashing is the use of an algorithm that creates a snippet or fingerprint of various forms of media. (A source)

There is a good article on Habr which can be found here.

What can perceptual hashes be used for?

To speed up the search for records like regular indexes. This can greatly increase the search speed. Well, that’s what we’ll do.

A few words about the database of video frames “VideoColor”

Database of video frames “VideoColor” is intended for quick search of the names of video films, as well as the position of frames in the video. In the future, its creators want to bring the number of indexed videos to 1,000,000 files.

The structure of records in the “VideoColor” database

The picture shows that all records are ordered by the index hash value (blue groups). Within each group, there are subgroups (highlighted in green) where the entries have the same perceptual hash value.

Practical use

In the image below, you can see 32 central elements selected from table 11×7 (highlighted in gray). They can be used to build a 4 byte perceptual hash. In some cases, this is sufficient. Of course, you can use any other scheme as well.

So, we find the average values ​​of the sums R, G, B in the specified areas, calculate the average value for all selected areas S and if it is more S, then we consider the current bit equal 1, otherwise equal to 0

The problem of using perceptual hashes

There is a problem of ambiguity in perceptual hashes associated with the fact that at some points the values ​​have border values.

Value Positions

When calculating the hash, we need to convert the average value either to 0 or 1, and here it is possible to get into trouble.

Since when transcoding a video for a different resolution or bitrate, the frames obtained when viewing are slightly different from the original, the following may happen. The value at one point or another will be different for two similar images. This will result in different perceptual hash values. And this, in turn, will lead to the fact that we are guaranteed not to find the desired frame.

Everything is lost!

Generating an array of hashes

To solve the boundary value problem, we will proceed as follows. In all cases, when there are values ​​in the boundary zone, we will double the hashes, and in the first group there will be hashes with 0 in this position, and in the second group there will be hashes with 1 in this position. As a result, we will carry out more search operations, but there is no other way out. Of the minuses, there can be 32 such doublings. Of the pluses, this is not such a frequent event and usually (on average) there are 2-3 doublings (ie 4 or 8 hashes). Situations when there are too many doublings, we can consider emergency and in this case forcibly limit their number to 6-7.

Using a range of hashes in case of a large number of them

Another trick will allow us to use perceptual hashes more economically. For example, we can search not by a specific value, but by a range of values ​​(min, max). If the border of the range is small, the difference between min and max is small, then the size of the group for comparison with the sample will be small. Thus, you can completely get rid of the problem with doubling the array of hashes for the least significant bits (it is quite possible to use the last 8 bits).


We used perceptual hashes to speed up the search for a frame in the database, namely to speed up the search in a group with a given index hash. In cases where there are a lot of records in the group, the use of additional search taking into account the perceptual hash significantly speeds up the process.

Related links

Only registered users can participate in the survey. Come in, please.

Can you use Google, Yandex and Microsoft to find the title of a movie one frame at a time?

Yes, if famous actors are present in the frame


How the cards will fall


I don’t know, haven’t tried


Nobody has voted yet. There are no abstentions.

Only registered users can participate in the survey. Come in, please.

Can you use Google, Yandex and Microsoft to find one frame at a time, its position in the video?

Nobody has voted yet. There are no abstentions.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *