About user identification
Now every person spends most of his time on the Internet. Therefore, companies are ready to invest huge amounts of money in advertising on the Internet. In order for advertising on the Internet to be effective, it is necessary to be able to identify the same user on different sites, browsers, and devices. Without identification, it is impossible to answer such important advertising questions as:
Whether the user clicked on the ad they just viewed;
Whether the user who viewed the product today is the same user who saw the ad a couple of days ago;
What are the interests of the user who is now sitting at the computer and watching ads;
The number of times we showed ads to the same user.
In this article I will talk about what are the main methods of identification today.
Born user in the endless Internet space, and what are the advantages and disadvantages of each of them.
Basic User Identification Methods
Most of the existing user IDs are either device-level or browser-level IDs. This means that they do not identify the person himself, but the device or browser through which he surfs the Internet. This is logical, since it is easier to “reach out” to a device or browser from the Internet than to the person himself.
The main device-level and browser-level identifiers that are currently used by advertising platforms to improve the effectiveness of advertising:
Cookies (first party and third party)
Device and browser specifications
HTTP entity tag (ETag)
Mobile Advertising Identifiers (IDFA, Google’s AID, Microsoft’s Advertising ID, etc.)
In addition to browser- and device-level identifiers, companies can build their own person-level identifiers.
Let’s take a closer look at each of the browser- and device-level identifiers, as well as the two main approaches used by advertising agencies to build person-level identifiers.
When you visit a website, you are often asked if you accept cookies. Well, a cookie is one of the most popular browser-level user identifiers. If you accept cookies, the website you visit remembers you.
Technically it is implemented like this:
When you accept cookies, the web server places a small file on your device. This file may contain the user id, some of its characteristics and settings.
After some time, you visit the same site again.
The web server sees the user id cookie on your computer it previously placed and recognizes you as a user who has previously visited the site.
There are two types of cookies – first-party and third-party.
First party cookies
The difference between first-party cookies and third-party cookies is who creates them and how they are used.
First-party cookies are created by the website the user is visiting. That is, if you visit, for example, the site edx.com, then this site will place the first-party cookie on your computer.
First-party cookies are used to improve user experience. That is, the site remembers the user in order to remember his settings on the site. Then, when you visit the site for the second time, the site has the language that you previously chose, and the basket contains the goods that you previously put there.
Third party cookies
Third-party cookies are placed by a domain that is different from the site the user is visiting.
For example, you visit the website bbc.com. This site shows you ads from some bestdsp.com. Then bbc.com will place a first-party cookie on your computer and bestdsp.com will place a third-party cookie on your computer.
Third-party cookies are used to improve the quality of online advertising, they help to identify the same user on different web domains.
Let’s say the same user visits bbc.com first and then edx.com. On both of these sites, bestdsp.com served ads and created its own third-party cookie on the user’s device. Then, bestdsp.com will understand that both of these web domains had the same user.
First-party cookies cannot do this, if a domain created a first-party cookie on the user’s device, then the user can later be identified only on the sites of this domain.
Synchronization of third-party cookies
In order for the same user to be recognized on even more domains, third-party cookie synchronization is used.
Let’s say there are two technological advertising platforms – dsp1 and dmp1. Each of these platforms has its own third-party cookies with id and user characteristics on different devices.
In order for the two platforms to be able to authenticate users across even more domains, the platforms can agree on and synchronize their user ids and characteristics. The result of such synchronization will be a table in which for each id and user characteristics from dsp1 there is a corresponding id and user characteristics from dmp1.
Now, if dsp1 sees some unknown to itself, but familiar to dmp1 user, then dsp1 can get information about this user from the table with synchronized data.
For example, Safari started blocking third-party cookies by default a couple of years ago, Firefox also blocks third-party cookies by default, and Google promises to block cookies for Chrome in the second half of 2024.
The user can also be identified in the Internet space by the characteristics and settings of the device, which can usually be obtained from the http request header.
Characteristics of the device that are usually available:
Operating system and version
Browser and browser version
Time zone, etc.
The main disadvantage of identifying a user by device characteristics is that, despite their great diversity, there are often many more users. Therefore, a large number of users can have the same characteristics and settings of their devices.
Another disadvantage is that the characteristics of the device change frequently.
Another way to identify a user is ETag. ETag is a parameter in an HTTP request that is used to improve page loading speed.
During the loading of a web page, the browser may send requests to different web servers. If the request url contains an ETag for some loaded object, for example, for an image on an advertisement, then the web server compares the ETag from the url with the ETag in its cache. If they match, it means that the image has not changed. The web server tells this to the browser, the browser understands that the image in the cache is still up to date and loads the image from there.
Advertising platforms can identify the user by these ETags by comparing the ETag received from the browser with their own.
The downside to using ETags is that they don’t last long and don’t have a very large reach. ETags are deleted every time the user clears the browser cache.
In-app advertising identifiers
Above, we described how you can identify a user in a browser. At the same time, the identification methods that we discussed above apply not only to the browser on the computer, but also to the browser in the mobile application.
Now let’s discuss identifiers that are used in mobile applications:
IDFA (Apple’s advertising id)
AID (Google’s advertising id)
Microsoft’s Advertising ID
A big plus of mobile IDs is that they are more durable than all the IDs used in web browsers. This is due to the fact that they are difficult to turn off or erase, as, for example, you can do with cookies or ETags. They can only be changed, but users rarely do this.
The exception is IDFA, which users must agree to be transferred to advertising platforms. You have most likely seen on iPhones, when you go into a newly downloaded application, “Ask app not to track” appears there. If you agree, you are giving Apple permission to allow advertising platforms to access your IDFA.
All of the user IDs listed above are either device-level or browser-level. This means that the same person, if using different devices or different browsers, will be counted several times. Most often, for advertising campaigns, this does not create great difficulties. However, advertising campaigns will work more effectively if there is an identification of a person, and not a device or browser.
There are different ways to define a user at person-level. These methods can be divided into deterministic and probabilistic.
A deterministic definition of a user at the person level can be made using different personal characteristics, which are usually unique to the same person and do not change from device to device:
Many sites collect this information from users. For example, to log in to social networks, you need to enter a phone number, and to read an article on a certain site, you need to enter your mail.
The disadvantage of a deterministic user definition is that there are very few sites that collect personal characteristics. Therefore, this method of defining a user is difficult to scale.
The probabilistic identification of a user is done using many different pieces of information about the user, which are compared with each other using various probabilistic models.
To probabilistically determine a user at a person-level, information such as IP, country, city, interests, behavior, etc. can be used.
That is, let’s say there are two devices – a phone and a laptop – with the same ip and with a similar search history, then with some probability we can assume that these two devices belong to the same person.
One can guess that the main drawback of the probabilistic identification of a person is the inaccuracy of the conclusions. However, the information needed for probabilistic identification is much more readily available than the information needed for deterministic identification.
This is the end
In this article, we examined the most popular ways to identify a user in the Internet space.
Each of these methods has pros and cons. Therefore, advertising agencies often use a combination of all of these identifiers, based on the strengths of each.