The saddest equation in Data Science
Stock up on handkerchiefs! Now I will tell you the whole truth about statistics and data science. You will have tears in your eyes, I promise you.
CONCLUSION = DATA + ASSUMPTIONS. In other words, statistics do not give you the truth.
Often you can hear the following misconceptions:
- “If I find the right equations, I can find out what no one knows right now.”
- “If I add maths to my data, I can reduce the uncertainty.”
- “Statistics can turn data into truth!”
It all sounds like fairy tales, right? Because it is they who are.
There is no magic in the world that can help you create something out of nothing. Forget about it. Statistics about another. Take my word for statistics. (As a bonus, this article will save you a ton of the time you would have spent pursuing this pipe dream.)
Unfortunately, many charlatans will try to convince you otherwise. They will use the standard technique, “You don’t know the equations with which I threw you, so acknowledge my advantage and do as I say!”
Do not fall for the words of these posers.
About the author: Cassie Kozyrkov is a South African data and statistics specialist. She founded Decision Intelligence at Google, where she is Principal Researcher.
Do not repeat the fate of Icarus
Think of statistical conclusions (in short, “statistics”) as a jump from what we know (our usual data) to what we don’t know (our parameter of the population).
In statistics, what you know is not the same as what you would like to know.
Maybe you want the facts about tomorrow, but you can draw conclusions only on the basis of yesterday. (So annoying when we don’t remember the future, right?) Maybe you want to know what all your potential users think about your product, but you can only interview a hundred. Then you get uncertainty!
This is not magic, this is speculation
How can you jump from what you know to what you don’t know? You need a bridge to bridge this chasm. And the name of this bridge is speculation. Let me remind you of the most painful equation in data science: DATA + ASSUMPTIONS = FORECAST.
DATA + ASSUMPTIONS = FORECASTING.
(You can easily replace the word “prediction” with “conclusions” or “forecasts” if it’s more convenient for you. All of this is about the same thing: a statement about something that you don’t know for sure.)
What is an assumption?
If we knew all the facts (and we were sure that these are undeniable facts), we would not need assumptions (or statistics). Assumptions are the ugly pieces that you use to build a bridge between what you know and what you would like to know. These are cheats that you have to use when you need the numbers to converge, but the data is not enough.
Assumptions are the ugly patches that you apply in places where there is no information.
How would I say it bluntly? Assumption is not a fact, it is nonsense that you come up with because you do not have enough information. If you often belittle people at their super-precise intervals, remember that it is too reckless to call truth based on assumptions. Better take statistics as a tool for making decisions. This tool is not perfect, but still better than nothing (in certain situations).
Statistics is your attempt to do everything in your power in a world of uncertainty.
Assumptions and African Assumptions. They will not turn into facts by the wave of a magic wand.
Assumptions are part of decision making.
Show me any decision made without speculation. I will easily list many implicit assumptions that you make in real life without even thinking.
Examples: When you read a newspaper, do you assume that all facts are verified? When you made plans for 2020, did you expect there would be a global pandemic? If you analyzed the data, did you assume that the data was recorded without error? Did you expect your random number generator to give random results? (Usually they are not random.) When you decide to make a purchase on the Internet, do you assume that the correct amount will be charged to you? What about your last snack? Did you assume that he was not poisoned? When you took the medicine, did you * know * about its long-term effect or … guessed?
Whether you like it or not, assumptions are part of decision making
Whether you like it or not, assumptions are always part of decision making. Intervention in real-world data should consist of many recorded assumptions. At the same time, data scientists must describe all the angles that they will have to go around.
Even if you decide to dispense with statistics, you probably use assumptions to decide how to proceed. For your own safety, you must be aware of the assumptions your decisions are based on.
How is the “magic” of statistics
In the field of statistics, there are many tools that allow you to formulate assumptions and combine them with evidence. Thus intelligent decisions are born. (Here you can see my 8-minute introduction to statistics.)
It is absurd to expect that analysis, including uncertainty and probability, will become a source of truth with a capital “P”.
Yes, this is how statistical magic works. You choose which assumptions to live with, then combine them with the data. On the basis of this wicked union, you make intelligent decisions. That’s all the statistics.
That is why analysis, including uncertainty and probability, can never become a source of truth with a capital “P”. There is no secret dark magic doing this for you.
Two people can come to completely different conclusions based on the same data! It is enough for them to make different assumptions.
For the same reason, two people can come to completely different conclusions based on the same data! It is enough for them to make different assumptions. Statistics gives you a tool that allows you to make decisions more consciously, but there is no single rule for its use. This is a personal decision making tool.
How well you do the research depends on how good the assumptions you make.
What about science?
What happens when a scientist uses statistics to draw conclusions? He simply forms an opinion and decides to share it with the whole world. This is not bad, by will, not by will, scientists periodically have to draw conclusions in this way, such is their job. I suggest that sometimes these conclusions can be heeded.
Out of will, scientists periodically have to draw conclusions based on statistics, such is their job.
I enjoy listening to the advice of people who have more information and experience than I do, but I never allow myself to confuse opinions with facts. There are scientists who are well versed in probability and work with it. Nevertheless, I also met with scientists who made so many statistical errors that they cannot be raked for the rest of my life. Opinions cannot (and should not) influence people who are not ready to formulate assumptions for themselves. These opinions were obtained through a combination of evidence and unverified assumptions. They cannot be considered competent.
Think of statistics as a science that can help you make decisions when you are unsure of something. This is a framework that helps you make informed decisions with a lack of information. There is no one true way to use statistics.
No, she does not give you the necessary facts. She gives you what you need to deal with a lack of facts. The point of statistics is to help you do everything in your power in a world of uncertainty.
You only need to make assumptions.
Translation: Diana Sheremyeva
Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by completing SkillFactory paid online courses:
- Machine Learning Course (12 weeks)
- Learning Data Science from scratch (12 months)
- Analyst profession with any starting level (9 months)
- Python for Web Development Course (9 months)
- Trends in Data Scenсe 2020
- Data Science is dead. Long live Business Science
- The coolest Data Scientist does not waste time on statistics
- How to Become a Data Scientist Without Online Courses
- Sorting cheat sheet for Data Science
- Data Science for the Humanities: What is Data
- Steroid Data Scenario: Introducing Decision Intelligence