Knights of Injustice. Data Scientists vs. Data Bias
Many, probably, have already seen the movie “Knights of Justice”, where data scientists, based on a set of facts about a terrorist attack, almost solved the crime, but made a mistake, the probability of which was critically small. Let’s discuss some aspects of data bias that are fraught with dangerous decisions.
Data bias reasons
The first and most obvious reason for the occurrence of biases is the data that were selected for training the model, by themselves turned out to be biased, not reflecting objective reality at all or reflecting it with some fixed distortion (for example, these are data on children’s ability to learn, why for some reason recruited before the introduction of general education).
Theoretically, this type of bias should be visible at the stage of a specialist’s work with data, when a data scientist looks at the features and asks uncomfortable questions: why were certain features collected in a dataset, is it possible to expand it and take more features that will describe the situation more fully, and so on. If the answer that the model should give was previously a value judgment of a person, which he made based on his intuition and personal impression (for example, an insurance agent interviews potentially high-risk clients and makes his decision whether to give them insurance approval and for how much ), then it may turn out that his answer simply cannot be automated.
Let’s remember experiment with the automation of the work of a recruiter from Amazon. It turned out that artificial intelligence favored male candidates, either simply because there were historically more male resumes available, or because of the preferences of human recruiters who had done the job before. If the headhunter believed that the male candidate was more promising, then when training the model, there is nowhere to know if there is something dubious in this statement. It is clear that in a system in which we would like to use artificial intelligence to select candidates based on their personal qualities, such a result has become undesirable.
If we worked with this data further, we would think about the possibility of excluding the gender feature, since we can see a shift in it. For example, they suspected that we had an imbalanced sample because the data had been collected for too long: there were more good male candidates in the past, so you need to wait a little longer and collect more data or balance the sample in other ways.
In general, a lot of research questions can be solved by a data scientist who examines the data and looks for some patterns in it.
There is another side to data bias: it would seem that the data on which the model was trained are good for everyone: they describe the situation objectively and fully, the model is trained on the data … But then something changes. Often such situations arise in production. Here are some examples of reasons:
sensors that take data can be replaced with similar ones, but of a different company and with a different error;
readjustment or calibration of existing sensors can be performed;
the sensor, always measuring in centimeters, begins to return measurements in meters;
in fact, the sensor does not work, but instead of missing values it transmits a constant, etc.
To track such problems, data monitoring tools are available. They verify that the new data is not critically different from the data from model training. If we start to get a lot of data that lies, for example, outside the training ranges, or the data begins to differ in its statistical characteristics from the data used for training (for example, somewhere the sensor began to fall frequently and return the value “300” instead of the usual value “400-700”), the monitoring system informs the specialists about this, and people are already figuring out what happened, whether the process has changed so much that we now need a different model.
Another rather interesting type of data bias that occurs after we put the model into production is the result bias due to the fact that the model affects the situation.
If the model recommends something – on this basis, the actions of the person for whom the recommendation is made change.
And over time, our data can move farther and farther away from the dataset on which the training originally took place. For example, if we are offered movies that suit our tastes, but do not offer controversial options, we fix in the model behavior that potentially cuts out films for us that might be interesting to us, but did not get into the search results due to the fact that the top -5 was chosen very well. It sounds strange, but precisely because the model works well, it starts to work badly, recommending us things related to our “bubble”, but avoiding cool new things that would also interest and attract us.
To reduce the impact of this bias, tests are carried out from time to time to allocate a part of the audience to a control group and test new variants of the recommendation model. Thus, you can find out whether users will choose more diverse content or better respond to new advertising that the previous version of the model would not have offered them – already morally obsolete, since the audience has changed a little under its influence.
Trust but verify
Can artificial intelligence solutions be fully trusted? A logical question even after a couple of simple examples of how changes in data make models not always reliable.
It is of course logical that wholly – it is impossible. At least for now. It is not clear to what extent the “for now” clause can be overcome in the future: we do not trust anything in the world without reservations, even our own decisions (and we could be sure of ourselves!) Are questioned. Let me give you an example with airplanes: there are information systems for controlling aircrafts in the autopilot mode, which were developed a long time ago and work very efficiently. As you probably know, most of today’s flights are automatic. Nevertheless, it is impossible to be one hundred percent sure that an emergency situation will not occur, and if it happens, one pilot will be enough and you can do without a backup pilot – it is impossible, despite all the super-regulation and long history of aviation. The cost of a system error is hundreds of lives, so all situations are duplicated both at the program level and in the instructions for people.
Yes, there are neural networks and other models that are very well trained for specific tasks. But just as conditions can change dramatically, there will simply always be some share of errors that arise due to factors unaccounted for during training – or simply because the model could not be sufficiently universal. In this regard, scientists continue to discuss many questions of the applied aspect of data science: what can be acceptable trust in a model in a particular area; what should be the capabilities of a person to intervene in it; what if the intervention will make it worse, because the model worked well, but it seemed to the person that it was working badly, and so on. And since we cannot answer all these questions unambiguously even at the level of “person-to-person” trust (for example, in the case of interaction with a doctor in controversial cases, we will always ask for a second opinion), it is obvious that to answer this question for artificial intelligence solutions we won’t be ready for a long time.
In the next post, I’ll lay out my thoughts on the ethics of AI and the important implications of data bias. Keep for updates 🙂
Alexandra Tsareva, Machine Learning Specialist, Jet Infosystems