Hello, Habr! My name is Maxim Gorshkov, I work in the information security department of Cloud4Y. Today I will show you how much user data is leaked during normal surfing the Internet.
Information has long been the “new gold”, and this is especially true for user information. In the era of electronic services, services and trade, the goal of any marketer is to find a ready client. However, this has to be done in limited conditions, in the absence of direct contact with the latter. In fact, sellers have only one element to work with – the user’s digital fingerprint.
You may have already noticed that soon after reading the article about the new phone model, all advertising modules on the Internet start offering the new phone at a bargain price. IT giants have long understood the value of collecting user data, but society has sounded the alarm relatively recently. As a result, notifications of a user agreement for the processing of personal data appeared on the sites. In the Russian Federation, this is a mandatory requirement enshrined in the Federal Law “On Personal Data” dated July 27, 2006 N 152-FZ.
However, the question remains, what data can sites collect? You can find a bunch of materials on this topic on the net, but I decided to conduct an independent investigation and am making the results with you.
What sites know about us
As a test bench, we use a virtual machine running in cloud service Cloud4Y in demo mode (free). We will use the operating system from the Lunix family – Parrot OS. As additional software you will need Ngrok and the Bigbro module from framework OsintSAN…
With the help of Bigbro, we bring up a ready-made web page from the list:
And with the help of Ngrok, we will redirect our local traffic through port 8080 to an external IP address. And thus we will get access to our local web service on the Internet.
Now you need to access from your device using the path specified in the Ngrok command line (the address in this program is generated complex, but you can use any link shortener to simplify the input).
The site is open. We are trying to interact with him. After any button is pressed, the browser warns us that the page is requesting access to our location. Few people are surprised by this request. We agree. Switch to the virtual machine and go to the command line with the Bigbro module running:
That’s how much data we have! It would seem that the information about the configuration of our device does not give anything to the owner of the site. No matter how it is. This data allows you to identify the user even for a VPN or when working from another browser.
Someone might say that many users have similar configurations. This is not entirely true. Modern algorithms make it possible to identify users by their activity. If you, for example, from two accounts and from different browsers like the video on YouTube, then the algorithm in most cases will take into account only one, so do not rely on anonymity in such a case.
Next, we have the location parameter. We see that the place from which we entered the network is determined with an accuracy of 25 meters. When I clicked on the Google Maps link, I was convinced that the location of the device was indeed accurately determined. And you can be easily identified, especially if you are in an area with a low population density (for example, in a village).
But such data can be obtained in the new version of the tested program:
So, it is already clear that data about us can be collected in one click and one click on the websites. In addition, you have probably come across a pop-up notification that a particular browser is using the webcam and microphone on your device. What is it for? It’s not hard to guess. After all, surely after talking near a laptop or phone about the intention to buy a car or some other thing, you saw an advertisement for the product under discussion. Only the most naive or inexperienced user can think that information about his preferences, data, tastes, desires is not needed by anyone and is not collected by anyone.
The question arises: what can be done about it? After all, there are special browsers that promise anonymity, there are browser plugins and the public’s favorite anonymous VPNs. Since the collection of personal data of users has become commonplace, and you will not find a single online store that does not collect such information, an interesting thing arises when using anonymization technologies.
The fact is that the overwhelming majority of Internet users use a standard browser with standard settings, and the data of such users “dissolves” in the general mass of site visitors. Now imagine that you decided to classify your IP address behind a private VPN, install it, go to the same page and … nothing has changed! Why, because your address is kind of Swiss?
The secret is that algorithms have learned to evaluate user data in a complex. Advanced algorithms will never “believe” a changed IP address if other behavioral characteristics remain the same, and session cookies have not changed. In addition, the first time you visit the site, your “browser fingerprint” is generated. It turns out that if you take measures to combat fingerprinting, then you will only attract attention with your unique footprints.
Let’s add here the inconvenience when working with sites, since anonymization tools can both slow down the work with services and cut down their functionality.
For example, if you disable Java scripts in your browser, the Bigbro application and similar services will not receive information about you. But the page will also look unreadable, because a lot of necessary functions will be disabled. For example, a tool to optimize pages for monitor resolution. And in general, almost all the functionality of pages is written in JS.
In other words, the more anonymity you want, the less comfortable it is to surf the Internet.
What to do
How then to be? I will answer this question this way: it is currently impossible to become completely anonymous, so there should be a reasonable line in this direction.
Do I need to hide my “fingerprint” on the Internet? Of course, yes. Consider this your digital passport. You need a passport, but you can’t show it to the first person you meet, let alone allow it to be copied.
How to hide? Ditching technologies like Java Scripts is a bad idea. Better to choose a different path – to replace your imprint. This can be done using browsers (eg Sphere) and plugins (Privacy Badge or Ghostery). You can spoof IP using a VPN. And be sure to clear your browser cookies.
It is also imperative to disable the most serious fingerprint tools: access to the webcam or microphone in the browser. Such features are rarely required and should be disabled by default.
If you decide to seriously attend to the topic of protecting your prints on the Internet, then I recommend starting from scratch. It will be difficult to clean up existing marks. A complete reinstallation of the system can help. Or, if you are using virtual desktops and other cloud provider services, you can host a new virtual machine.
By the way, in order to check the fingerprints left by your browser, it is not necessary to deploy the test bench, as I did. You can use the following services:
Instead of a conclusion
Over the decades of development, the Internet has absorbed all social processes. With its help, we can receive most of the government services and provide ourselves with everything we need, from ordering goods to getting an education.
In such conditions, you just need to monitor the safety of your personal data so as not to become a victim of marketing or, much worse, malefactors. It is important to observe “digital hygiene”, not to visit questionable sites, to control the information that web resources can receive from you.
It is appropriate to mention here the words of Eric Qualman, professor at Boston University and author of Everything That Happens in Vegas Stays Online: Due to the abundance of persistent data about users, you need to behave online as if your mom, boss, coach and enemies are watching you.
Thank you for attention.
What else is interesting in the blog Cloud4Y
→ Frequent errors in Nginx settings, due to which the web server becomes vulnerable
→ Password as a Horcrux: Another way to protect your credentials
→ Tim Berners-Lee suggests storing personal data in pods
→ Prepare vApp template for VMware vCenter + ESXi test environment
→ Create AlwaysON Availability Group based on Failover Cluster
Subscribe to our Telegram-channel, so as not to miss the next article. We write no more than twice a week and only on business.