Big Data Myths and Digital Culture

5 min


We continue to publish the most interesting reports. RAIF, the annual artificial intelligence forum organized by Jet Infosystems. Today we want to share the story of Boris Asenovich Novikov, Doctor of Physics and Mathematics, Professor at the HSE Department of Informatics.

Big Data Myths and Digital Culture

The word big in our case refers more to myths than to data, so I will talk mainly about the former, but in the context of the latter. Since I have been pretending to work in the scientific community for several decades, I will start by defining it to look like accurate knowledge.

Myths are an integral part of the culture of society, they have always existed and continue to appear in the modern world. I give examples:

The older part of the audience should remember the noise around the year 2000, which in fact is one of 400 relatively honest ways to extract money from the customer, nothing more. Of course, the disaster did not happen then.

A lot of myths arise around software engineering – there are many different points of view, and I will not concentrate on this topic now.

An initiative from above prompted me to this report: at the university where I worked, there was a need to teach digital literacy to everyone, from kindergarten to graduate school. No one knew what it was, and I rashly admitted to the management that I roughly understood how to do it … and got caught. It was necessary to learn different specialties in one program:

My main contribution to the matter was that I renamed this course from Digital Literacy to Digital Culture.

At one of the international conferences I heard this statement: in order to attract the attention of the audience, you need to add at least some hint of sexuality to the report, and so: a few years ago in the press (in particular, in Russia) the case was widely discussed an American schoolgirl began to send advertisements for pregnant women (the sexual context of the story ends here), then the family filed a lawsuit, but in the end the lawsuit had to be withdrawn … Because the girl really turned out to be pregnant. History made a lot of noise, they say, these analysts know more about us than ourselves (this is unlikely)! All this is very dangerous, and it is necessary to strengthen the defense. So myths were born:

  1. Big data is extremely dangerous
  2. They know more about us than ourselves.
  3. Additional security measures required

Do not get me wrong: safety is important, but let’s see how to evaluate this case professionally.

What conclusion can be made? The analysis SOMETIMES can produce the right results, and we can also say that sometimes we don’t know anything.

My friends and colleagues draw attention to the fact that random mailing sometimes also gives the correct results, and we can’t say anything about the quality of the mailing unless we evaluate any quantitative indicators. First of all, it is necessary to evaluate the completeness and accuracy.

The following types of myths I borrowed from a foreign context. For example, at one of the top SIGMOD 2019 data processing conferences, there was a panel discussion (or, as we say, a round table) on the topic “Responsible Data Science”. There were discussed examples of how the irresponsible use of data analysis, machine learning, etc. happens. As one example, they cited the story of determining the sex of a person from eye photographs. People worked on this for several years, reached an accuracy of as much as 80%, until one skeptic found out that in fact they determine the presence or absence of cosmetics.

This is a curiosity, but here is an example in which the danger is already absolutely real: we are talking about using machine learning methods to identify criminals from photographs. As it turned out, in the very principle of the work of this learning system there are problems with political correctness: firstly, they gave false-positive answers with different frequencies depending on race, and secondly, as it turned out later, in fact, they determined the presence or absence of a smile on photos, nothing more. However, there were attempts to use this system, and the officers who were supposed to use the results, in case of disagreement, were supposed to write a written explanation of why they did not agree with the results that the system produces. This is an example of how myths can become dangerous to society.

For some reason we are talking Data Science, although we are talking about industrial applications. In all other areas – Computer Science, but … Software Engineering. Equations of mathematical physics and some kind of bridge building, or something else? Colleagues, scientists cannot be trusted! I would like to think that Data Science belongs to the “Science” section, and unfortunately, the wording of Data Engineering is already taken up by another concept.

I return to the story with the design of the course for the entire university, regardless of preparedness and specialty. The picture on the right side (swan, cancer and pike) shows how the team assembled from representatives of all university departments worked.

However, we tried to do something reasonable. The idea was to show simple things that every researcher can do on his own, regardless of the area in which he works. Moreover, so that he can understand at what point (this is the most important!), You need to contact data processing professionals. I tried to avoid such recipes for beginners (but little came of it), such as “Make addition a popular, but not practical guide.”

So, myths are inevitable, and we must understand that they still have to deal with. Myths are the source of many mistakes, failures and problems, and sometimes can even be dangerous – the thoughtless use of mythical “knowledge” can have negative consequences.

In addition to the fact that we are developing technologies, it is necessary to educate society, and this is a constant concern that will never be completely solved, because humanity in general does not develop as fast as technology. To train people is much more difficult than artificial intelligence – one of the sources of myths. We need to learn how to work and live with it in such a way as to avoid great dangers.


0 Comments

Leave a Reply