The result of the Depix program (source)
Hollywood films like to exaggerate. They zoom photos a million times – and output numbers from one pixel.
Although this is incredible, scientific research in this area has been going on for a long time. Back in the 90s, theoretical works and PoC were published with the restoration of text from blurred images. In 2012, Vladimir Yuzhikov wrote on Habré about his SmartDeblur program for restoring blurred and defocused images.
Despite the fairly good development of science in this direction, until now there has not been a specialized tool specifically for recover passwords (text) after pixelation. Program Depix Is the first such instrument.
In 2019, Dmitry Vatolin, head of the video group of the Computer Graphics and Multimedia Laboratory of the Moscow State University, spoke about the current state of science in terms of sharpening photos. He said that the Russian police constantly ask him for help, although he does not understand the complexity of the problem:
The questions are always the same. “We have a video with the suspect, please help to restore the face” … “Help increase the number from the DVR” … “You can’t see the person’s hands here, please help zoom in” … And so on in the same spirit.
To make it clear what this is about – here is a real example of a highly compressed video sent, where they ask to restore a blurred face (whose size is equivalent to about 8 pixels).
In general, the problem of sharpening is really relevant. Everyone wants to find information in the frame that is not there.
What is Pixelation?
First, what is pixelation. It is the process of partially reducing the resolution of an image to hide information. Used in various fields. Many companies use pixelation to hide passwords and other sensitive data in internal documents.
The Depix implementation attacks (that is, attempts to reverse the output) of a generic linear box filter. This filter takes a block of pixels and overwrites it with the average of all pixels in the block. The implementation is simple and fast, since several blocks can be processed in parallel.
The figure below shows an example of such a line filter. The smiley image is divided into four blocks, for each of which the average color is calculated – and it is written instead of the original pixel color, which leads to the final pixelation. It is not possible to directly reverse the filter, since the original information is lost.
Anti-blur tools, history and research
Images can be blurred in many ways. Linear filter pixelation for pixel blocks is just one option. Most blur algorithms tend to blend / stretch pixels as they try to simulate natural blur that is caused by camera movement or defocusing.
there is a bunch of instruments for sharpening for common tasks such as sharpening photographs. Unfortunately, it is with passwords that a different approach is needed. Here the height of the characters is only a couple of blocks, so there is no point in just sharpening, writes the author of the Depix program.
Above, we have provided links to some of the tools and studies published on Habré since 2012.
Recent developments in artificial intelligence have spawned bizarre headlines like “Researchers Have Created a Tool That Perfectly Sharpens Faces.” The illustration below shows examples from scientific article describing the PULSE algorithm researchers from Duke University (USA).
Operation of the PULSE algorithm from Duke University
But in fact, the AI is not here restores photos, and generates new images that are blurred into the same pixels. The foundation of these works is laid by the algorithm RAISR from 2016. AI generates faces that are blurred into the image given at the input. It is important to understand that the generated face is not the original from which the original blur was obtained.
Algorithms like PULSE seem new, but they have a very long history of blur removal tools. Back in 1994 (!) Mark Bouyer from the Southwest Research Institute (USA) wrote a program for generating “Pluto”, blurring pictures and comparing them with real photographs from the Hubble telescope.
In the well-known article 2006 by Dhira Venkatraman explains the algorithm how to recover a pixelated credit card number. The idea is simple: generate all the credit card numbers, pixelate them – and compare the result to the pixelated number.
For example, we see on the Internet a photograph of a check or bank card with a blurred number. As you can see, here we used a linear filter for 8×8 pixel blocks for blurring:
How can these numbers be restored?
1. Take a sample of a blank blank.
2. The script generates pictures for all numbers.
3. Blur each image based on the sample of the original image.
4. Determine the brightness vector of each image. Type vector contains the brightness values of each block.
Here the check number 0000001 corresponds to the vector
We also determine the brightness vector of the sample …
5. Find the vector with the minimum distance from the original (after normalization).
d(0000001) = 1.9363 d(0000002) = 1.9373 ... d(1124587) = 0.12566 d(1124588) = 0.00000
So we find the check number: 1124588…
In 2019 Somdev Sangwan described An interesting method for restoring blurred faces in OSINT investigations. The method is as follows: the resolution of the photo is increased in Photoshop. It gets blurred first:
And then a search is launched on Yandex images (more advanced than Google Images). In this case, Yandex performs “brute-force” of the face in the image:
It is easy to see that all the described methods have something in common. If there is not enough information to correctly restore the image, then we pixelate the similar data – and check if they match.
This is the basis for our algorithm for recovering passwords from screenshots.
Description of the password recovery algorithm
Linear filter for blocks is a deterministic algorithm, then pixelation of the same values always results in the same block. You can try to restore the text in approximately the same way as the numbers in the example above. Each block or combination of blocks can be considered a sub-task.
The algorithm has certain limitations. It requires the same size and color of text on the same background. Modern text editors also add hue and saturation, allowing for a huge number of possible font variations in the screenshot.
Here’s a pretty simple solution. We take de Bruijn sequence for the expected symbols, paste it into the same editor and take a screenshot. This screenshot is used as a wildcard image for similar blocks:
This sequence includes all two-character combinations. It is important to use exactly twocharacter combinations because some pixel blocks cover more than one character.
For the search to work, a block with exactly the same pixel configuration is required. For example, in the test image, the algorithm could not find a part of the letter ‘o’, because in the generated image this block also included a part of the next letter, but in the original image it was clean.
Creating a de Bruijn sequence with spaces around, obviously, creates the same problem, just the opposite: the algorithm will not be able to find the correct blocks, where the adjacent letter hit the edge of the block. You can generate an image with all the letter combinations, and with empty spaces around the edges. For such a picture, the search will take longer, but will give better results.
For most blurry images, the tool finds single match results for blocks. Then it is verified that the matches of the surrounding blocks are at the same geometric distance as in the blurred image.
After going through all the blocks, the program outputs all the correct blocks directly, and for blocks with multiple matches, it outputs the average. The output isn’t perfect, but it works pretty well. The figure shows a test image with random characters. Most of the symbols are readable.
Depix source code published on Github…
By the way, the described technique has something in common with some well-known cryptographic attacks. For example, it looks like cracking hashes, resembles an attack on the ECB block cipher, and plaintext attack (KPA).
Our company offers secure servers with free DDoS protection. The ability to use a licensed Windows Server at plans with 4 GB of RAM or higher, creating server backups automatically or in one click.
We use extremely fast server drives from Intel and do not save on hardware – only branded equipment and some of the best data centers in Russia and the EU. Hurry up to check.