Problems of personal data protection in the world of artificial intelligence

Artificial intelligence is now, by and large, everywhere. In any industry, we are told that neural networks, machine learning, and other areas of AI are used. Systems related to the processing of users' personal data are no exception. In this article, we will talk about how artificial intelligence and personal data protection are related.

Risks in AI data processing

The operation of artificial intelligence algorithms consists of two main stages: training and use. During the training stage, artificial intelligence algorithms are trained on special data sets, which allows them to identify patterns and relationships between different data points. At this stage, neural networks are essentially trained to recognize the necessary entities, be it images in a photo or text analysis.

At the stage of use, the model should already be trained and should be able to distinguish the required images or analyze text in the required way.

It is quite obvious that the key element for both steps to be successful is data. This input data is millions of images on which the neural network is trained or thousands of pages of text.

It is important to understand that no powerful neural network can do without “data vacuum cleaners”, since truly huge amounts of data are required for its training.

And this is where questions from the area of ​​personal data arise.

Restrictions on processing personal data

Regulatory acts in the field of personal data protection impose certain requirements for the processing of PDn. We must obtain the subject's consent to process his personal data. The purposes of processing and terms must also be clearly defined. All this works well when we have some application or database of clients, partners, employees, and we ensure data processing in this system.

But what happens when personal data ends up in the hands of AI-powered systems for one reason or another? A purpose restriction states that data subjects must be informed about the purposes for which the data is being collected and processed. This allows subjects to choose whether to consent to it. However, machine learning systems sometimes use information that is a by-product of the original data (for example, using social media data to calculate a user metric). This can lead to hidden or indirect processing of personal data, such as obtaining information about users of the very social media where the neural network analyzed the metrics.

A separate story is image analysis. In theory, images, or rather photographs, of people should not allow these subjects to be identified without their consent. However, in practice, AI can be trained in such a way that it can fairly accurately guess who is depicted in a photograph, thereby allowing the subjects of personal data to be identified.

Main risks

Various artificial intelligence systems use data to assess certain human qualities, such as productivity, health, personal preferences, and so on. For what purposes the collected data can then be used is a big question, it is quite possible that they can be used by government agencies and intelligence agencies, as well as various commercial organizations, such as banks, marketing agencies, and others. And crime should not be discounted. Advanced fraudsters may well use the results of AI work if they gain access to them.

In addition, artificial intelligence systems are a black box, not only for those who use them, but also for the developers themselves, who can, of course, tweak some parameters in the algorithms. But in general, the same neural networks write their own algorithms during training, so how they make decisions is often a mystery even for developers.

Also, discrimination is mentioned among the risks of using AI in many publications, especially English-language ones. Machine learning algorithms can build their stereotypes based on certain human traits, most often gender, race, and age. If decisions about issuing a loan or hiring are made automatically based on these algorithms, this can lead to discrimination. That is, decisions are made not based on objective data, but on distorted ideas, that is, initially incorrect training of the neural network.

How can you protect yourself?

First of all, it is necessary to reduce the volume of processed data to a minimum. When developing AI training algorithms, engineers must determine what data and in what quantity are needed to complete the task. It is important to process only the minimum necessary amount of personal data, while maximally anonymizing the information that does not require processing as PDn.

Data subjects have the right to decide for themselves which of their data will be used by operators. This means that the operator must be open and transparent about why it collects this data and what it plans to do with it. As mentioned, artificial intelligence is a “black box” and it is not always clear how the model makes decisions, especially in complex applications.

Generative Adversarial Networks

Generative Adversarial Networks (GANs) can be used as one of the algorithms when working with personal data in ML.

The current trend in ML is to use less data more efficiently. GAN reduces the need for training data by using output data to generate input data. In other words, the input data is used to determine what the output will look like.

This method uses two neural networks, a “generator” and a “discriminator.” The generator learns to combine data to generate an image similar to the output, while the discriminator learns to tell the difference between real data and artificially generated data. The downside, however, is that GANs do not eliminate the need for training, as they require a lot of data to properly train. However, with this method, we can reduce the amount of personal data needed for the algorithm to work.

Federated learning

Federated learning uses personal data, but it never actually leaves the system it’s stored in. It’s never collected or uploaded to the backend of an AI or ML system. Instead, the model is trained locally in the system where the data exists and is later merged with the main model during development.

Although federated learning avoids some of the problems associated with personal data, an ML model trained locally will face greater limitations than ML models trained in an ad-hoc system.

Conclusion

We have considered the main problems associated with the use of artificial intelligence when working with personal data. Of course, some of the problems presented are inherent not only to personal data, but to any information processed by AI.

But it is important to understand that there is no perfect solution to the problems associated with protecting personal data. The presented training methods have a number of limitations that must be taken into account when designing.

Continuing with the topic: today (September 17) in the evening, Otus will host an open lesson dedicated to the topic of collecting personal data in games. Participants will consider all related issues, as well as analyze real cases of violations of work with PDn in games. You can sign up for the lesson on the course page.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *