Least square method

speed of change the values ​​of this function when increasing the independent variable.
2 I know in advance what the error function should be equal to – zero. I look for coefficients where the function is either zero or minimal. This is where the derivative with respect to k_i on the left is negative (increasing k_i leads to a decrease error functions), and on the right is positive – leads to an increase in the error value. And the derivative is equal to something between negative and positive values ​​- “0”.

  • Often you have to take derivatives of standard error functions and somehow stick X_train, y_train into them and somehow average.
    In ML they love squared error.

  • For what?

    It penalizes more for the absolute error q than for 2 errors of size q/2. And it definitely has a derivative at zero. And you can't calculate the derivative of the modulus of the error – 0/|0| tends to -1 on one side and 1 on the other. MAE can cause at least internal discomfort. And there is no point in considering an error without squares and modules at all, since if you make an error on some data by an average of +9000, on others by -9000 you will get a zero error and zero predictive power.

    • Formulas often look like (Gibberish – y)^2 and at first glance seem identical. You need to look first at the context, at what point the Gibberish was calculated and why. You'll have to differentiate.

    I'll also take the squared error (I could have averagesince their derivative is the same, but the least squares method does not imply this) and minimize its square. 2 signs + C.

      \sum_{i=1}^{} (y_i - k_1 x_{1i} + k_2 x_{2i} + C)^2 = 0

    I have X-s in the initial data, but not k. I will differentiate by k, and for now X will be a certain coefficient:

    \frac{\partial \text{OLS}}{\partial C} = -2 \sum_{i=1}^{n} (y_i - C - k_1 x_{1i} - k_2 x_{2i})\frac{\partial \text{OLS}}{\partial k_1} = -2 \sum_{i=1}^{n} x_{1i} (y_i - C - k_1 x_{1i} - k_2 x_{2i})\frac{\partial \text{OLS}}{\partial k_2} = -2 \sum_{i=1}^{n} x_{2i} (y_i - C - k_1 x_{1i} - k_2 x_{2i})

    See how to do it in the online calculator
    (you need to specify foo in the function and k_1 in the argument)

    OLS (ordinary least squares)

    This method is from statistics. It starts here, after I get the derivatives, when I choose how exactly I'm going to transform them into k.
    First, I look for an extremum, a point of rest (not to be confused with a turning point), as it was said in the database at the desired point the derivative with respect to k equals zero.

    \begin{align*} \frac{\partial \text{OLS}}{\partial C} &= 0, & \frac{\partial \text{OLS}}{\partial k_1} &= 0, & \frac{\partial \text{OLS}}{\partial k_2} &= 0 \end{align*}

    In total, this is a new SLAE. It is SLAE, since x1_i, whether in the first or in the hundredth power, is just a number. I will edit SLAE a little, namely, I will “break apart” SUM and throw out -2 as unnecessary. There is nothing complicated here, I just opened the brackets and put SUM on each term separatelyThere won't be anything more complicated in the analytical part.

    \sum_{i=1}^{n} x_{1i}y_{i} = C \sum_{i=1}^{n} x_{1i} - k_{1} \sum_{i=1}^{n} x_{1i}^{2} - k_{2} \sum_{i=1}^{n} x_{1i}x_{2i}

    Similarly for k2 and C
    \sum_{i=1}^{n} x_{2i}y_{i} = C \sum_{i=1}^{n} x_{2i} - k_{1} \sum_{i=1}^{n} x_{1i}x_{2i} - k_{2} \sum_{i=1}^{n} x_{2i}^{2}\sum_{i=1}^{n} y_{i} = nC + k_{1} \sum_{i=1}^{n} x_{1i} + k_{2} \sum_{i=1}^{n} x_{2i}

    In the new SLAE everything is numbers except the coefficients. I imagine that k is unknown and just solve. More precisely there will be numbers, if I have some initial data.

    X1

    X2

    y

    1

    2

    1.5

    2

    3

    1.8

    3

    1

    3.2

    4

    5

    3.6

    5

    4

    5.1

    After substituting, multiplying and adding, I got:
    15.2 = 5*C + 15*k1 + 15*k2
    54.6 = 15*C + 55*k1 + 51*k2
    50.0 = 15*C + 51*k1 + 55*k2

    View the solution in the online calculator SLAE

    MSE = 0.0971

    View bike in python
    import numpy as np
    from scipy.linalg import solve
    
    
    y = np.array([[1.5, 1.8, 3.2, 3.6, 5.1]]) # Всё лежит на боку, мне так удобно
    X = np.array([[1,2,3,4,5], [2,3,1,5,4]])  # Если хочешь привычную форму - транспонируй X и y
    
    SUM_x1 = X[0].sum()
    SUM_x2 = X[1].sum()
    SUM_y = y.sum()
    
    SUM_x1_x1 = (X[0]**2).sum()
    SUM_x2_x2 = (X[1]**2).sum()
    SUM_x1_x2 = (X[0] * X[1]).sum()
    
    SUM_x1_y = (X[0] * y).sum()
    SUM_x2_y = (X[1] * y).sum()
    #Дальше СЛАУ
    """
    SUM_y = len(y[0])*C + k1*SUM_x1 + k2*SUM_x2
    SUM_x1_y  = C*SUM_x1 - k1*SUM_x1_x1 - k2*SUM_x1_x2
    SUM_x2_y = C*SUM_x2  - k1*SUM_x1_x2 - k2*SUM_x2_x2
    """
    
    X_ = np.array([[len(y[0]), SUM_x1, SUM_x2],  # Тут уже нормально - одна строка это все признаки одной точки
                      [SUM_x1, SUM_x1_x1, SUM_x1_x2],
                      [SUM_x2, SUM_x1_x2, SUM_x2_x2]])
    b = np.array([SUM_y, SUM_x1_y, SUM_x2_y])
    k_all = solve(X_,b)
    print(f'k_all = {k_all}\n')  # [ 0.5275   0.99375 -0.15625]
    View the matrix bike (This is also a bit difficult, but you can skip it)

    I found this formula:

    OLS = y^T y - 2b^TX^T y + b^TX^TX b

    MNC(ols) is a number, so all terms must be numbers, I'll check at least this:

    • yT obviously you can multiply by y, it's originally a column, so in the end number will.

    • bT dot XT is a row (L=n) multiplied by a table of n columns and m rows. A row multiplied by a table gives a row.

    • bT dot XT dot y it's row n by column n, so number

    • bT dot XT was parsed 2 lines above. Line n

    • bT dot XT dot X is row n on a table of height n. Row n

    • bT dot XT dot X dot b it's row n by column n, number

    Matrix differentiation is similar to the usual one. We remove the power of b and add two before the term if necessary. We throw out terms without b

    -2X^T y + 2X^TX b = 0X^\top \cdot y = X^\top \cdot X \cdot b

    It remains to be expressed b = XTdot(X) / XTdot(y)but you can't do that with matrices. Instead, the matrix inverse to the dividend is matrix multiplied by the quotient
    X_T_X_inv = np.linalg.inv(X.T.dot(X))
    matrix_b = X_T_X_inv.dot(X.T.dot(y))
    If a matrix contains rows obtained from other rows by scaling, the inverse matrix cannot be found, for this there are pseudo-inverse matrices and the method pinv.

    Why is that?

    The matrix is ​​just a tabular form of recording SLAE (almost always). SLAE with 3 unknowns and with 3 equations, where one equation is obtained by scaling another, is also not solvable, and A.dot(inv(A)) = UNIT MATRIX this is also essentially an equation and by looking for the inverse matrix, I am solving it. Both rows and columns are equivalent in this decision.

    Now I'll check via python:

    X = np.array([[1,1,1,1,1], [1,2,3,4,5], [2,3,1,5,4]]).T
    y = np.array([[1.5, 1.8, 3.2, 3.6, 5.1]]).T
    X_T_X_inv = np.linalg.inv(X.T.dot(X))
    matrix_b = X_T_X_inv.dot(X.T.dot(y))
    print(f'b = {matrix_b}')  # b = [ 0.5275   0.99375 -0.15625]
    View factory solution, python
    import scipy
    X = np.array([[1,1,1,1,1], [1,2,3,4,5], [2,3,1,5,4]]).T   
    # Тут надо уже столбцами поставить
    # И добавить столбец единиц. Добавляю в начало типо k0, но без разницы
    y = np.array([[1.5, 1.8, 3.2, 3.6, 5.1]]).T
    b, squared_error_sum, matrix_rank, SVD_ = scipy.linalg.lstsq(X, y)
    print(b)  # вообще в ЛР обычно пишут b или w, математическое k не в почёте
    #[[ 0.5275 ], [ 0.99375], [-0.15625]]

    It is impossible not to add:

    • solve and lstsq is not only in scipy.linalg, but also in numpy.linalg

    • This is logical, since OLS and LR in general appeared long before ML. Statistics and econometrics textbooks can be useful.

    • Gradient descent with learning_rate = 0.005 reached the ideal value from the least squares method in only 13k iterations. And with learning_rate = 0.01 it was completely shaken.

    And that's all I need to know about MNCs. I hope.

    Similar Posts

    Leave a Reply

    Your email address will not be published. Required fields are marked *