Logistic Regression: A Detailed Overview


Figure 1. Logistic regression model. A source.

Logistic regression has been used in the field of biological research since the early twentieth century. Then it began to be used in many social sciences. Logistic regression is applicable when the dependent variable (target value) is categorical. Details in the illustrations – in the material, and practice – in our Data Science course.

For example, we need to predict:

  • whether the email is spam (1) or not (0);
  • whether the tumor is malignant (1) or benign (0).

Let’s try to answer the second question.

Sayshruti seems to have made a mistake here when reworking her text, as it first says “Consider a scenario where we need to classify whether an email is spam or not” and then “Say if the actual class is malignant…the data point will be classified as not malignant which can lead to a serious consequence in real time”, therefore, in order to be consistent and logical, the first sentence had to be changed. The second cannot be changed, since the importance of these tasks and the consequences of a classification error are not comparable. – approx. transl.

If you use linear regression for this, you must first set the post-classification threshold. If the tumor is indeed malignant, the predicted continuous value is 0.4, and the threshold value is 0.5, then the data point will be classified as a benign tumor. Such a conclusion can lead to disaster.

This example shows that linear regression is not [всегда] suitable for solving the classification problem. Linear regression does not have an anchor, so you often need to use logistic regression, where the value strictly varies from 0 to 1.

Simple logistic regression

Full source code

Model

  • The output of the model is 0 or 1.
  • Hypothesis => Z = WX + B.
  • hΘ(x) = sigmoid(Z).

Sigmoid function



Figure 2. Sigmoid activation function

The predicted value of Y(predicted) tends to 1 for positively infinite Z, and to 0 for negatively infinite Z.

Hypothesis analysis

With this hypothesis, the output is a probability estimate. It determines the confidence that the estimated value will match the actual value. Input – X values0 and X1. Based on x value1 we can estimate the probability as 0.8. Thus, the probability that the tumor is malignant is 80%.

You can write it like this:



Figure 3. Mathematical representation

This is the reason for the name – logistic regression. The data fit into a linear regression model, which is then acted upon by a logistic function. This function describes the target categorical dependent variable.

Types of logistic regression

1. Binary logistic regression.

Classification in just 2 possible categories, for example spam.

2. Multinomial logistic regression.

These are three or more categories without ranking, for example, the definition of the most popular food system (vegetarian, non-vegetarian, vegan).

3. Ordinal logistic regression

Ranking in three or more categories – ranking films with ratings from 1 to 5

Decision Boundary

To determine which class the data belongs to, a threshold value is set. Based on this threshold, the resulting probability estimate is classified into classes.

For example, if predicted_value ≥ 0.5, then the email is classified as spam, otherwise it is classified as non-spam.

The decision boundary can be linear or non-linear. The order of the polynomial can be increased to get a complex decision boundary.

Loss function (cost function)



Figure 4. Logistic regression loss function

Why is the loss function that was used for linear regression not applicable for logistic regression? Linear regression uses the standard error as the cost function. If you use it for logistic regression, then the parameter function (tetha) will be non-convex, and gradient descent converges to the global minimum only in the case of a convex function.



Figure 5. Convex and non-convex loss functions

Explanation of the loss function



Figure 6. Loss function: part 1



Figure 7. Loss function: part 2

Simplified loss function



Figure 8. Simplified loss function

Why does the loss function look like this?



Figure 9. Explanation of the maximum likelihood method: part 1



Figure 10. Explanation of the maximum likelihood method: part 2

This function is negative, because when learning it is necessary to maximize the probability by minimizing the loss function. Reducing the loss will increase the maximum likelihood, assuming that the samples are drawn from an identical independent distribution.

Derivation of the formula for the gradient descent algorithm



Figure 11. Gradient Descent Algorithm: Part 1



Figure 12 Gradient Descent Explained: Part 2

Implementation in Python:

def weightInitialization(n_features):
    w = np.zeros((1,n_features))
    b = 0
    return w,b
def sigmoid_activation(result):
    final_result = 1/(1+np.exp(-result))
    return final_result

def model_optimize(w, b, X, Y):
    m = X.shape[0]

    #Prediction
    final_result = sigmoid_activation(np.dot(w,X.T)+b)
    Y_T = Y.T
    cost = (-1/m)*(np.sum((Y_T*np.log(final_result)) + ((1-Y_T)*(np.log(1-final_result)))))
    #

    #Gradient calculation
    dw = (1/m)*(np.dot(X.T, (final_result-Y.T).T))
    db = (1/m)*(np.sum(final_result-Y.T))

    grads = {"dw": dw, "db": db}

    return grads, cost
def model_predict(w, b, X, Y, learning_rate, no_iterations):
    costs = []
    for i in range(no_iterations):
        #
        grads, cost = model_optimize(w,b,X,Y)
        #
        dw = grads["dw"]
        db = grads["db"]
        #weight update
        w = w - (learning_rate * (dw.T))
        b = b - (learning_rate * db)
        #

        if (i % 100 == 0):
            costs.append(cost)
            #print("Cost after %i iteration is %f" %(i, cost))

    #final parameters
    coeff = {"w": w, "b": b}
    gradient = {"dw": dw, "db": db}

    return coeff, gradient, costs
def predict(final_pred, m):
    y_pred = np.zeros((1,m))
    for i in range(final_pred.shape[1]):
        if final_pred[0][i] > 0.5:
            y_pred[0][i] = 1
    return y_pred

Losses and number of iterations



Figure 13. Cost reduction

The accuracy of training and testing the system is 100%. This implementation refers to binary logistic regression. When using more than 2 classes, Softmax regression must be used.

This tutorial is based on a deep learning course by Professor Andrew Ng.

Here all code.

Brief catalog of courses

Data Science and Machine Learning

Python, web development

Mobile development

Java and C#

From basics to depth

As well as

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *