Turns out there is a way to track individual non-logged-in users without using cookies. I implemented this too. Now I will show you how.
For clarity, I created a demo site. Here he is.
Click on each of the three buttons Page → On all three, the same identifier.
Close the browser window and open the site again → The identifier has not changed.
Turn off the computer and go to this web page tomorrow → The identifier is still the same.
Check your cookies → The demo site does not record cookies or read them.
Check URL -> There are no questionable query strings.
At EDISON, we are constantly confronted with all these nuances of privacy.
Our new completed project – “Resident Taxi” – in which we linked together a taxi aggregator website, a CRM system, a driver and car control unit, mobile applications for iOS and Android.
However, we always consider: the security of each user is sacred 😉
Cookies are used less and less.
If you are a fairly active Internet user, you must have come across one way or another of endless discussions about cookies and how they are used. Currently, browser-based technologies are increasingly refusing cookies – all the more so since everything is strictly regulated by privacy rules such as GDPR or CCPA. Although this is certainly progress, as it is an important step towards a more privacy-oriented Internet, on the other hand it also inflicts enormous damage on the basic functionality of most websites, their UX, the economic structure of the Internet and the digital analytics industry. Although in the technical aspect the use of a cookie by the browser as an identifier for the returning user is very reliable, there are other web technologies based on the storage of information on the local computer.
Here cache. Essentially, web caching means storing data from the Internet on your device, so the browser can reuse this data later when the same resource is requested again. For example, when a user loads a web page for the first time, the server completely sends the page to the browser. When the page is cached, and the user requests the same page again the next day, the browser remembers it, and the server does not need to send it again, the page in the browser can be immediately displayed from the cache. This is much faster, and provides high bandwidth. In general, caching technology significantly increases the speed of delivery of web content, and also significantly reduces the amount of work performed on the server side.
Caching can be done using ETag. These are identifiers that are attached to each resource provided by the server (for example, a web page or image). In this way, the server finds out if the user has cached the latest version of the resource. When a resource on the server changes, a new ETag is created for that resource.
The user visits the website for the first time. → The request does not contain ETag. → The site page is sent to the browser with ETag 123. → The site is stored (cached) on the local device.
The user goes back to the same site again → ETag 123 is included in the outgoing request → The server checks if the resource has changed (“Does the ETag ID remain the same?”) → If the ETag has not changed, the server instructs the browser: just use the site that has already been delivered and cached on Monday. → There is no need to resend the web resource, time and traffic are saved. Profit
Using caching technology to track and identify users
Although ETag is designed specifically for caching, this feature can also be hacked and deliberately used to track users.
Here is how I did it in my example:
- A simple site with three pages is created.
- The same iFrame is inserted on each page. This iFrame is just a 1×1 white pixel that is invisible to the user.
- When this iFrame web resource is requested, a random server-side identifier is generated using PHP. I use this identifier to override the ETag identifier for the iFrame, which is usually issued automatically by the server.
- Each time a user requests one of three pages (and therefore requests an iFrame), my ETag ID is included in the request. It then checks on the server side whether this identifier exists or is this the first request without an ETag.
- → If an ETag exists: it means it is a returning visitor. Can be recorded as an identity card.
- → If the ETag does not exist: this is a new visitor. New ID. From now on, this identifier will be included in all the request headers of this user’s device on the site.
- As a final step, here is how this ETag ID gets into analytics:
Finding ETag ID iFrame with Chrome DevTools.
How to prevent tracking with ETag
However, users have several options for protecting against ETag tracking:
- Disable caching in your browser settings.
Care should be taken here – as explained above, caching can be very useful and has many advantages.
- Modify headers headers with a browser add-on.
Although most browsers do not initially provide the ability to modify headers, there are many browser extensions available for this task, such as ModHeader. Why does it work? ETag functionality relies on request and response headers for identifier exchange. For example, if a user overwrites the If-None-Match header so that it is blank for every request, a new ETag value will be generated for every page request. This prevents user device identification.
Why is it important
Why am I checking these things? Why did I write this article? Of course, I do not intend to use this on a large scale. But although ETag can be used by bad people, this example demonstrates an important point: like most other technologies, ETag is not necessarily harmful by default. Depending on what purpose it is used for.
I believe that it is important for everyone to know about the existence of such methods. And that they can be used. There have been quite a few cases where sites have used ETag illegally. Some of these incidents were even settled in court. And it is likely that similar methods will be increasingly used by the frightened advertising industry, which oversees how one of its pillars collapses: coockie.
ETag can generate unique tracking values even if the user blocks HTTP, Flash, and / or HTML5 cookies.
I believe in an open and transparent transfer of knowledge in the industry – among analytics providers, publishers, advertisers and Internet users. IMHO, the lack of openness is one of the main reasons why we were all drawn into this dirty war with cookies: the Internet ecosystem has always suffered from a lack of transparency, technology is developing too fast for legislation to keep pace with them, and people don’t understand the many subtleties of web technologies like cookies. And when technology is used improperly, the user for obvious reasons feels hurt. But technology prohibition turns out to be a classic case of dealing with symptoms, not the cause. The fact that many technology companies abuse technologies such as cookies creates an unfair public attitude towards technology. Which, in turn, leads to disproportionate measures on the part of browser developers and legislation. Although these measures are aimed at ensuring privacy, they also harm good and meaningful innovation.
There are always nuances. I strongly believe in the legitimacy and importance of serious digital analytics – as long as it is carried out with the proper level of confidentiality. What happens after a store legally identifies a visitor? ETag, of course, can be used for different purposes. But one thing is certain: this topic will never become boring.