Average errors and their squares

I'm currently taking a basic course in machine learning. In the second lesson within the MO block, in a video lecture, my teacher shows the formula

$\text{L} = (y_i - \hat{y}_i)^2$

And he says that this is loss and it is also the standard deviation, and MSE is the average, so MSE is the standard deviation.

$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$

Here I fell out a little, since I remember exactly from physics that the standard deviation is the root of a very similar formula. Let's figure it out.

There is a formula for the dispersion of a random variable, here it is

$\sigma^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2$

MSE looks very similar indeed. But there must be reasons why they have different names. Firstly, MSE is not about a random variable. Secondly, pay attention to $\bar{x}$ And $\hat{y}$ , the point, of course, is not that one is x and the other is y. The fact is that trait over a variable means its mathematical expectation, and y dressed himself hat because it is not real, but estimated, predicted. There are other designations in statistics:

$\tilde{x}$ for median
for optimal value

And probably many others that I don't know about.

In total, in the case of the dispersion of a random variable, we are dealing with the mathematical expectation of this value (in the amount one pieces) and many the results obtained around it. In the case of MSE we are dealing with many predictions and many true results. For everyone predictions exist its a target to hit, and not one common for everyone. At least for now, we are solving the problem of calculating the size of the target, and not writing a model for hitting the target with all types of darts, spears and arrows.

Under the cut there is a database about mate waiting, you most likely don’t need it

Hidden text

The mathematical expectation is the value the distance between which and random variables will be the smallest on average. If we have a discrete value, and we throw a dice, for example, checkmate the expectation of the result: (1+6)*2 =3.5. We will never get such a number, but on average the distance from the results to this number is minimal. If you take and draw a third dot on the side with the 2, then the probability of getting a 3 will double, and the 2 will disappear. We can either roll the dice many times and again calculate the arithmetic mean and it will coincide with the expected value, or we can use a formula that takes into account different probabilities:

$\bar{x} = \sum$ p*x, where p is the probability of getting a number, and x is the numbers themselves.

(1/6)*(1+4+5+6) + (2/6)*3 = 3.66… Indeed, it turned out a little more than 3.5.

In connection with the checkmate expectation, you can also read about Martingale and marvel at the resourcefulness of casino workers to attract regular customers.

As stated above, there is no general expectation for predictions and cannot be. You can think of the true target values as a set of mathematical expectations for objects (rows of the table under study) separately (this is already an assumption), but this does not equalize the variance with MSE, variance is the spread around, and MSE is the average jamb of the model according to the data. Their formulas use different differences. If you want to call MSE something else, then it is not the dispersion or the standard deviation, but the mean square of the difference.

Now Standard deviation, RMS, mean square, standard deviation, standard deviation, Standard deviation.

These are all aliases for the square root of the variance. Not from MSE, but from the variance of the random variable. And if you take the square root of MSE it is already RMSEbut not RMSalthough both are obtained by taking the root of the results of seemingly similar functions. RMSE will come in handy, if your absolute values are of a large order and MSE, as befits a quadratic function, flies into space. IN difference from SKO, which only causes confusion in the Moscow Region.

PS: If I’m wrong somewhere, I’m sure Habr will correct me, I want the site to have a short article that puts an end to the issue of squared differences, their sums, their roots.