How to sabotage the data tech giants use to spy on you

Algorithms don’t work without quality data. The public can use them to demand change.

Every day, you leave a trail of digital breadcrumbs that big tech companies track your every move. You send an email, order food, watch a show on a streaming service. And give back valuable data packages that companies use to better understand your preferences. This data is “fed” by machine learning algorithms, and then shown to you advertisements and recommendations. Google alone generates $ 120 billion in ad revenue per year for personal information.

Increasingly, we are unable to withdraw from this arrangement with corporations. In 2019, Gizmodo reporter Kashmir Hill tried to exclude the five largest tech giants from your life… For six weeks she felt miserable, struggling to perform the most basic digital functions. Meanwhile, the tech giants have felt nothing.

Now researchers at Northwestern University are proposing a way to correct the power imbalance. They view our collective information as leverage. Technological giants have advanced algorithms at their disposal; but they don’t make sense without the right training data.

A new study was presented at the Association for Computing Fairness, Accountability and Transparency conference. Its authors, including graduate students Nicholas Vincent and Hanlin Lee, suggest three ways the public can use to advance their interests:

Boycott data (Data strikes), inspired by workers’ strikes. It includes hiding or deleting your personal information so that the technology company cannot use it. You can leave the platform or install privacy protection tools.
Data corruption (Data poisoning), which involves the transmission of meaningless or harmful information. So, you can use the AdNauseam browser extension. It clicks on every ad shown to you, and thus confuses Google’s targeting algorithms.
Deliberately publishing data (Conscious data contribution), or providing meaningful data to a competitor of the platform against which you want to protest. Upload your photos to Tumblr instead of Facebook, for example.

Users already employ many of these tactics to protect privacy. If you’ve ever turned on an ad blocker or other browser extension that modifies search results and excludes certain websites, you’ve already been involved in a data boycott in practice. That is, you tried to take back control of your personal information. However, as Hill found, such sporadic, individual actions do not force tech giants to change. his behavior.

But what happens if millions of people coordinate their actions and “poison” the data pool of one company? This will give them the opportunity to defend their claims.

Perhaps, it has already been possible to do this several times. In January, millions of users deleted their WhatsApp accounts and went to competitors including Signal and Telegram. This happened after Facebook (the owner of the popular messenger – approx. lane) announced that it will open up access to WhatsApp data for the entire company. The massive exodus forced Facebook to postpone a policy change.

Just this week, Google also announced that it will stop tracking people online and targeting ads. It is not yet clear if this is a real change or rebranding. Vincent notes that the widespread use of tools like AdNauseam could have influenced this decision, as it reduced the efficiency of the company’s algorithms. (Of course, it’s hard to say for sure. “Only a technology company really knows how effective the use of data has impacted the system,” says the researcher).

Vincent and Lee believe that sabotage campaigns can complement other strategies. Such as campaigning for policy change and uniting workers in a movement to counter the tech giants.

“It’s great to see such a study,” said Ali Alhatib, a research fellow at the Center for Applied Data Ethics at the University of San Francisco, who was not involved in the study. – It is interesting to see that authors are turning to a collective or holistic approach. We can spoil data on a massive scale and make demands, threatening it. Because this is our information, and all together it forms a common pool. “

There is still a lot of work to be done to roll out these campaigns more widely. Scientists can play an important role in creating more tools, such as AdNauseam, to help lower the barrier to participation. Politicians can help, too. A data boycott is most effective when backed up by strong privacy laws. Such as the General Data Protection Regulation (GDPR) of the European Union, which gives users the right to request that their information be deleted. Without regulation, it’s harder to ensure that a tech company allows you to clean up your digital footprint, even if you delete your account.

There are still some questions to be answered. How many people need to be involved in a data boycott to harm a company’s algorithms? And what data would be most effective at damaging a particular system? For example, when simulating a movie recommendation algorithm, researchers found that if 30% of users boycott the system, it would reduce the system’s accuracy by 50%. But all machine learning algorithms are different and companies are constantly updating them. The researchers hope that members of the machine learning communities will be able to run similar simulations of systems from different companies and identify their vulnerabilities.

Alhatib suggests more research is needed on how to stimulate collective action with information on the web. “Collective action is really difficult,” he says. – One of the problems is to get people to constantly act. And then another arises: how to keep a fickle group – in this case, people who use search for five seconds – so that they see themselves as part of a larger community? “

He adds that this tactic has consequences that require careful study. Will the sabotage end up just adding more work to the content moderators and other people tasked with clearing and labeling the algorithm training data?

Overall, Vincent, Lee and Alhatib are optimistic. They believe that collective data can effectively influence technology giants and influence the way we treat our information and our privacy.

“Artificial intelligence systems depend on data. It’s just a fact about their work, says Vincent. “Ultimately, this is how society can gain strength.”

Read our other translations on artificial intelligence: