3 ways to visually extract data with JavaScript

To the start Frontend development course We decided to share a translation of a small overview of visual attacks that allow us to obtain non-public information about a user, regardless of whether the domain restriction rule is applied. Some of the vulnerabilities discussed have been fixed, but advances in web technology are giving rise to new variations of old attacks.


What the page can show, but does not see

Have you noticed that you cannot set the height for visited links? A CSS rule targeting these links can set several attributes, but not all. Note that both links are the same height, although the bottom one should be taller:

Looking at the final styling we can also see the height of the unvisited link:

getComputedStyle(visitedLink).height; // 30px

For visited links, it is possible to set several properties, for example, text color:

a:visited {
  color: red;
}

But even if he looks like otherwise, JavaScript can only see normal link styles:

getComputedStyle(visitedLink).color; // rgb(0, 0, 238) (blue)

This is one example where the page might show something, but not sees. A link can be styled differently depending on its attribute, but the browser protects this information from the page.

Protected visual information

A more common example is images from various sources. They can be displayed on the page, but the browser imposes all sorts of restrictions on them.

I’ve seen people wonder why the web works the way it does, especially if these people were moving from other languages ​​to the realm of web development. If you want to display an image in Java or C ++, in order to do so, the program needs to access the bytes of the image. Without full access, they won’t be able to display it.

But JavaScript and the web work differently. In HTML, a simple shows an image without having to access the bytes first. And that opens up a window to have separate permissions to show something and access something.

The visited and unvisited link is a great example. The feature dates back to the earliest browsers and is still well supported. The user can see which links point to previously opened pages. This is great for browsing Wikipedia, for example, as you can immediately distinguish links that potentially contain new information.

But great for UX, it opens up a gaping security hole that’s hard to close. If a web page can determine whether a link has been visited or not, it can access information that should not be available. For example, it can check Google search URLs to see if a user has searched for certain terms. Terms can contain confidential information, and by matching a large number of such search queries, a web page can de-anonymize the user.

Security issues can arise from other elements, not just links, if these elements leak information to the site; uploading an image from another domain may contain confidential information. For example, a dynamic image can change depending on how many unread notifications you have:

This works because the browser sends cookies along with an image request containing session information that identifies the user to Facebook. If the site could read the response image, it could retrieve information about the user’s Facebook activity. For this reason, you cannot export the content of the canvas after rendering a cross-origin image on it – this phenomenon is called tainted canvas (ruined canvas).

And the elephant in the china shop, of course, IFrames. The page can be included in another page with all the login information and so on, unless explicitly prohibited by X-Frame-Options or Content Security Policy. If a web page could access any page it contains, that would give it complete freedom to manipulate the data it displays.

Visual attacks

Browsers do their best to protect information that is intended for the user to view, but not for the web page. However, sometimes they fail for different reasons, and due to some bugs, vulnerabilities happen. Let’s see the most interesting ones!

1. Visited links

The first vulnerability is associated with visited links, which are discussed above. Unsurprisingly, browsers have implemented methods to block information retrieval. This vulnerability is even described in the CSS 2.1 specification:

Therefore, user agents can treat all links as unvisited or take other measures to preserve the user’s privacy, while displaying visited and unvisited links differently.

This means that browsers restrict the type of styling that a visited link can have on top of unvisited links. For example, by not allowing the height of an element, a web page cannot check the position of an element under a link to see if it points to a visited URL.

But with new opportunities, vulnerabilities appear over and over again. By calling getComputedStyle JavaScript can read the current style of an element. And before that was fixed, the site could read the color of the link to see if it was visited or not. The vulnerability was discovered in 2006 and surfaced ten years later.

2. CSS Blending Modes

This is a great vulnerability for pixel by pixel extraction of visual information from an IFrame or other protected resource. Fast in the blog of Ruslan Khabalov perfectly explains the details of the vulnerability. Its gist is how the blend modes were implemented.

Blending modes allow a page to determine how elements interact with each other. This image shows several examples:

Notice how the center area changes depending on the blend mode and pixel colors of the two layers.

Although the page cannot access what the IFrame looks like (or an image from another site, or a visited link), it can freely place it on the page, even under other elements. This allows blending modes to display pixels of different colors depending on how the elements look.

But this shouldn’t lead to any kind of vulnerability, since the page cannot access the resulting pixel colors, it can only determine how the browser displays them. In any case, not directly.

The code for calculating the blend mode in the browser was implemented so that different branches are used for different input colors.

// example pseudocode from https://www.evonide.com/side-channel-attacking-browsers-through-css3-features/

[...]
SetSat(C, s)
    if(Cmax > Cmin)
        Cmid = (((Cmid - Cmin) x s) / (Cmax - Cmin))
        Cmax = s
    else
        Cmid = Cmax = 0
    Cmin = 0
    return C;
// Compute the saturation blend mode.
Saturation(Cb, Cs) = SetLum(SetSat(Cs, Sat(Cb)), Lum(Cb))

And since the page can control one part of the input pixels, it can try many variations and see the difference. And this leads to the leakage of information about another part of the input pixels, namely about the protected content.

The vulnerability allows data to be retrieved one pixel at a time and bypasses all browser protections against access from different sources. It was fixed by excluding branches from the implementation of the blend mode, which allowed the algorithm to work with constant execution time regardless of the input colors.

3. Angry CAPTCHA

This attack exploits the weakest point in every IT security system: the user. It is an ingenious way to extract information from another website, since the user actively participates in the attack, no standard protects against it.

CAPTCHA is a way to protect a site (or part of it) from bots. The task in a CAPTCHA should be easy for humans, but difficult for machines, for example, it might be reading characters from an image. Captcha is used to prevent automatic spam in the comments section or in the contact form. It looks like this:

Netanel Gelernter and Amir Herzberg show in his article a way to use the user’s knowledge of the CAPTCHA solution to extract information. In their implementation of the method, they loaded data in a slightly incomprehensible form and asked the user to enter it in a text field. For example, the cache manifest for Gmail contained the user’s email address:

Note that the CAPTCHA is just a modified version of the first 15 characters of the email address (victim1813 @ gmai). It looks like an innocent regular captcha, but it transfers this information to the site.

It is no longer possible to extract the user’s Gmail email address from the cache manifest file; but it is possible to embed a Facebook comment field in any site, which will still contain the user’s real name:

Note that the text contains the name “Inno Cent”. By typing it, the user inadvertently reveals his real name if he is logged in on Facebook.

This attack also opens the door to all other information extraction. The authors of the article used Bing’s personalized autocomplete feature, which revealed information about their search history. The image on the left shows a template with 4 areas for information extraction. The image on the right shows what the “last” CAPTCHA looks like, in this case it means that the user searched for 4 words:

This example used a privacy bug in Bing, but it’s not hard to imagine how it could also include checking if a link has been visited or not: just style the unvisited link to match the background. If the user sees it (and enters it in the text field), then the link has been visited.

The beauty of this attack is that it is almost impossible to implement a technical solution to prevent it. Fortunately, the applicability of this vulnerability is limited, since with its help it is possible to extract only text information and only a few times before the user gets bored and leaves the site.

The web is developing at a rapid pace, it is in great demand, and the complexity of browsers today is comparable to the complexity of operating systems. Writing a modern browser completely from scratch is a huge problem.

Over time, the number of various vulnerabilities will only grow, which means that we need people who can work with the web professionally and accurately. If you want to start a career in the web field, you can pay attention to our Frontend development coursewhere you will get a solid base from scratch to confidently work on the frontend.

find outhow to level up in other specialties or master them from scratch:

Other professions and courses

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *