How to understand whether you should overpay for beer, or the basics of applied statistics

I love beer, but I'm not some kind of expert, a rabid craft fanatic, or, God forbid, a beer sommelier. I'm a simple, hard-working, intellectually-challenged person who sometimes wants a cool, heady drink on a Friday. And so, on my way home on a Friday, I have a great many options for where to stop for a glass of beer. Cheap draft beer houses, bars at craft breweries, or just a store, sometimes simpler, sometimes more expensive.

But there's just one thing I can't understand. Sometimes you come to a bar and ask for beer. And they even bring it to you in a nice mug, and it seems tasty, but why 500 rubles for 0.5 liters? If I buy a can in a store for 80 rubles, will it be noticeably worse?

Well, I thought, it wasn't in vain that I studied? Let's get out the paper and pen and find out whether the overpayment is justified for me personally. And at the same time, let's get acquainted with the basics of mathematical statistics – perhaps one of the most important disciplines in science as a whole.

What would a text about beer be without Mads Mikkelsen?

What would a text about beer be without Mads Mikkelsen?

Disclaimer for the meticulous: despite the fact that the calculations are quite adequate, due to the small number of observations and a bunch of unaccounted factors, this experiment is more of a humorous and illustrative nature. Although for an ordinary, very busy person who is not familiar with statistics, this is probably the most accessible option to some extent reliably answer the question, “do I feel a difference?”

Formulation of the problem

First, it is important to decide what we want to find out. I did not grab the stars from the sky and started small. There are two options that are very convenient for me: dark beer from Perekrestok and a craft brewery shop with an excellent stout. The difference in price is obvious. With taste, in fact, it is more difficult. It seems that craft is a little tastier, but perhaps the taste of greed plays a decisive role here? To find out, you can conduct a blind experiment. Let's see if I can tell the difference between beer from a supermarket and craft.

The idea seems fine. It will go well with beer. And now it's time to roll from lyrics to statistics. Statistics is a section of mathematics that studies patterns in a large pile of homogeneous data.

You can't live on beer alone, can you?! Here's another unexpected example! Let's say we want to understand whether smoking affects human health. I might be revealing a terrible secret, but the fact that Baba Sraka smoked like a chimney her whole life and lived to be a hundred doesn't give much information. Everyone's health is different. Maybe she was just an athlete her whole life, worked in the fresh air and wasn't stressed. Or, even simpler, she won the genetic lottery. There are millions of factors! So what should we do? How can we understand whether people are dying from smoking, or whether this is a lie from the snus lobby?

It is understandable. Not with 100% certainty, but it is possible. All you need to do is collect a huge amount of data on a wide variety of smokers and non-smokers. If smokers get sick noticeably more often, then there is probably harm! Here, like a sharp bullet, the question should fly into your head: “noticeably” – how much is that? This is where mathematics comes to the rescue, namely, mathematical statistics. It allows us to calculate, based on our data, the probability with which this or that event will occur, if our assumption is correct.

What is probability? In everyday life, the probability of an event occurring in an experiment can be defined as the frequency of occurrence of this event. That is, if, with a huge number of coin tosses, heads come up in about half the cases, then the probability of heads coming up is approximately equal to 1/2.

By calculating the incidence of various diseases among smokers and non-smokers, we can estimate who gets sick more often.

And what about beer? It's the same with beer! If you simply compare two glasses of beer, a lot of factors can affect the sensation. Especially greed! It's impossible to take all the factors into account, so you have to resort to statistics. So, let's try to figure out if I can tell one beer from another by taste.

Table. How to use for our task? In the first column is the size of the first sample, in the second - the second. Then there are different levels of significance. We select the row with the desired sample sizes, select the column with the desired level of significance. At the intersection is the desired number.

Table. How to use for our task? In the first column is the size of the first sample, in the second – the second. Then there are different levels of significance. We select the row with the desired sample sizes, select the column with the desired level of significance. At the intersection is the desired number.

Basic assumption

Let's assume that I can't tell craft beer from store-bought beer as the main assumption of the experiment. Without going into details, I'll just say that it will be easier to test such a task later. Testing an assumption means that we'll try to make sure that our data satisfies this assumption or not. We can either accept or reject this assumption. Moreover, if we accept it, we can then test it again in some other way or collect more data, while a confident rejection of our assumption means that the data does not have this property. Asymmetry and inequality? They are dear ones!

Here we are like children playing with a fragile crystal vase. If the vase is intact, we can continue to play with it until we break it. But as soon as the vase shatters into pieces, playing with it immediately ends.

In our example, this will mean that if I did not see the difference, then I can do other experiments, drink another evening. And if I did see it, then other experiments are not needed, it turns out. And so, based on the initial data, the differences are obvious.

Experiment

To test the hypothesis, let's conduct the following experiment. We'll divide two cans of beer into 12 servings. (Here's a use for the 12-person service.) 6 servings of each drink. My ̶s̶o̶b̶u̶t̶y̶l̶n̶n̶i̶k̶ assistant, gives me servings so that I don't know what beer I'm drinking. I'll need to rank these servings from best to worst. Let's do it like this. I'll give the worst serving 1 point, and the best 12.

In our experiment, we got the following ranking. Craft portions were in 1st, 2nd, 5th, 6th, 9th, and 11th places in my personal rating. That is, in points they received, respectively: 12th, 11th, 8th, 7th, 4th, 2nd.

Store-bought beer was in 3rd, 4th, 7th, 8th, 10th and 12th place. Its servings got 10, 9, 6, 5, 3 and 1 points respectively.

Or from best to worst (K – craft, M – store):
K, K, M, M, K, K, M, M, K, M, K, M

Statistical criterion

Why such a complicated procedure? Well, for example, the following considerations can be applied. Let's calculate the sum of points for each drink. So, for the craft one it turned out to be 44 points, and for the store-bought one 34. It seems that the store-bought one is worse, but is it much worse?

Note that if I couldn't differentiate between the varieties at all, then with a very large number of servings in my rating, craft would be on average no higher than store-bought. I would simply say which beer is better, completely at random.

So, in our experiment, the most probable sum of points would be the same for both drinks, i.e. equal to 0.5 * (1 + 2 + … + 12) = 39 (I can suggest that the most attentive and most rigorous readers (retirees and schoolchildren according to Vavilov) think in more detail about why this is so; and you can also try to understand why such a ratio is fair |44 – 39| = |34 – 39| = 5, and why this is far from accidental).

Let's say I suddenly distinguished 6 times in a row between craft and store-bought, while there is no difference. This process can be compared to the fact that I took 6 black balls in a row from a bag with 6 white and 6 black balls. Needless to say, such an event is extremely unlikely? The probability of such an event is approximately 1 in 1000!

Let's imagine all the possible rankings I could make. For each ranking, we can calculate the minimum score for one of the drinks. Note that the more the score differs from 39, the less likely this result is, provided that I can't tell the difference between craft and store-bought. Then, if my data shows too low a score for one of the drinks, we can confidently say that there is a difference!

Mistakes

And how is it confident? Can it be touched somehow? Yes, it can! We will determine confidence by assessing the probability of making a mistake. What mistakes can we make as a result of the experiment?

We can agree that we do not distinguish between store-bought and craft beer when we feel the differences. It turns out that we say that the assumption is fulfilled when it is not. In this case, we can continue ̶p̶̶y̶ya̶n̶k̶u̶ other experiments and, for example, see the differences later. Such an error is sometimes called a false positive.

Or we can try to see differences where there are none. Then, with a smug grin, we will reject the original assumption and spend our whole lives paying more for the imaginary shade of greed, the bitterness of avarice and the aftertaste of overpayment. No, of course, we can conduct other experiments, but then we may have two contradictory results. And how do we know who is right and who is left? Such an error is sometimes called a false negative.

Minimizing the probability of the first error can lead to the fact that we will always see some differences. Indeed, if we always reject the initial assumption, then we will not be able to accept it incorrectly.

Minimizing the probability of the second error will lead to the fact that we will simply never feel the difference between craft and store-bought beer. If we always accept the assumption, then we will also not be able to incorrectly reject it.

Well, who needs such experiments? We need balance! It can be found in the following way.

The second error is equivalent to accidentally breaking a vase, and so we will try to keep it in check. We will say that we want the probability of the second error to be small and not exceed a fixed value in advance. For example, 1 error in 10 experiments, so we will consider the probability of the second error to be no more than 1/10. This probability is so important that they even came up with a name for it – the significance level.

Critical value

Next, we will find a number such that if the smaller of the two sums of points is less than this number, then we will boldly reject the assumption that there is no difference between the drinks. Moreover, we will take this number so that the probability of making a second mistake will be guaranteed to be less than 1/10. We will call this number the critical value. (We have constructed the experiment in such a way that for answering the main question of the note, it is critical whether the sum of points for store-bought beer is greater than this number or not)

The task of finding the critical value is quite tedious. It consists of considering all possible rankings and calculating the minimum scores in them. Then, for each possible score, we determine the probability of encountering it if there is no difference between the drinks. Otherwise, the probability that if I always randomly determine where the store-bought beer is and where the craft beer is, I will get a ranking with exactly this sum. Another way is to find the share of the number of rankings with a fixed score for store-bought beer among all possible rankings under the conditions specified above.

For example, as shown above, the fact that I guessed right everywhere where the craft and the store-bought is unlikely. For this case, the minimum of the two sums will be equal to the minimum possible sum of points, namely: 1 + 2 + 3 + 4 + 5 + 6 = 21. Such a sum can be obtained in only one ranking out of all, so its probability can be put as 1 to the total number of all possible rankings.

Thus, the probability that the sum of points for store-bought beer obtained in the experiment (the smaller sum of the two) does not exceed our desired number will be equal to the probability of making the second mistake. That is, if the sum of points is less than this number, then the probability that we have distinguished where nothing differs will be exactly less than 1/10.

This number is usually painstakingly searched for manually. Possible sums, their probabilities, and so on and so forth are calculated. Fortunately, all this has been calculated long ago, and to determine the required number, you can simply look at the answer in the table (it will be presented in the attached pictures).

For our experiment with 6 samples of one beer, 6 of another, and a significance level of 1/10, this number is 30. Since the minimum score is 34, and 34 is clearly more than 30, I can safely say that I don’t see any difference between beer for 450 rubles and for 100 rubles. So there’s probably no point in overpaying.

A bit off-topic, but during the investigation I came across a pearl from the Russian Science Citation Index.

A bit off-topic, but during the investigation I came across a pearl from the Russian Science Citation Index.

That's all

In fact, the above considerations are valid not only for beer experiments. This procedure is called the Wilcoxon statistical test (named after the guy who suggested using it in practice). This test is suitable for checking the assumptions that two groups of observations are not very different from each other when we know nothing about the data. As a rule, serious guys approach it more thoroughly. They can somehow transform the data or see what other properties it has. But for now we have chosen the simplest path, which does not require a thorough immersion in mathematics.

If the topic is interesting to readers, then it is quite possible to conduct a more serious study on the topic of beer, with the collection and preparation of data and with regression analysis, of course. All for real! It is long, it is difficult, but it is interesting! Well, shall we look for the best beer in Baku on the Neva?

You can try to build a similar experiment with any product at your leisure. For example, take black tea Princess Nuri and Greenfield. Fan fact: both brands of tea are products of the Orimi Trade Group of Companies. So maybe for you personally the bags differ only in the label?!

Another task seems to be solved! You can sleep peacefully. And you drink foamy beer in moderation, live an excellent life and don't forget about common sense! Beaver to all!

Author: Liza Ivanova

Original

Sources

1) Van der Waerden, Mathematical Statistics. – p. 336.
2) Bure V. M., Parilina E. M. Probability Theory and Mathematical Statistics – p. 328.
3) Wilcoxon, F. (1945). Individual comparisons using ranking methods. Biometrics, 1, 80-83.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *