Logistic Regression: A Detailed Overview
Figure 1. Logistic regression model. A source.
Logistic regression has been used in the field of biological research since the early twentieth century. Then it began to be used in many social sciences. Logistic regression is applicable when the dependent variable (target value) is categorical. Details in the illustrations – in the material, and practice – in our Data Science course.
For example, we need to predict:
- whether the email is spam (1) or not (0);
- whether the tumor is malignant (1) or benign (0).
Let’s try to answer the second question.
Sayshruti seems to have made a mistake here when reworking her text, as it first says “Consider a scenario where we need to classify whether an email is spam or not” and then “Say if the actual class is malignant…the data point will be classified as not malignant which can lead to a serious consequence in real time”, therefore, in order to be consistent and logical, the first sentence had to be changed. The second cannot be changed, since the importance of these tasks and the consequences of a classification error are not comparable. – approx. transl.
If you use linear regression for this, you must first set the post-classification threshold. If the tumor is indeed malignant, the predicted continuous value is 0.4, and the threshold value is 0.5, then the data point will be classified as a benign tumor. Such a conclusion can lead to disaster.
This example shows that linear regression is not [всегда] suitable for solving the classification problem. Linear regression does not have an anchor, so you often need to use logistic regression, where the value strictly varies from 0 to 1.
Simple logistic regression
Model
- The output of the model is 0 or 1.
- Hypothesis => Z = WX + B.
- hΘ(x) = sigmoid(Z).
Sigmoid function
Figure 2. Sigmoid activation function
The predicted value of Y(predicted) tends to 1 for positively infinite Z, and to 0 for negatively infinite Z.
Hypothesis analysis
With this hypothesis, the output is a probability estimate. It determines the confidence that the estimated value will match the actual value. Input – X values0 and X1. Based on x value1 we can estimate the probability as 0.8. Thus, the probability that the tumor is malignant is 80%.
You can write it like this:
Figure 3. Mathematical representation
This is the reason for the name – logistic regression. The data fit into a linear regression model, which is then acted upon by a logistic function. This function describes the target categorical dependent variable.
Types of logistic regression
1. Binary logistic regression.
Classification in just 2 possible categories, for example spam.
2. Multinomial logistic regression.
These are three or more categories without ranking, for example, the definition of the most popular food system (vegetarian, non-vegetarian, vegan).
3. Ordinal logistic regression
Ranking in three or more categories – ranking films with ratings from 1 to 5
Decision Boundary
To determine which class the data belongs to, a threshold value is set. Based on this threshold, the resulting probability estimate is classified into classes.
For example, if predicted_value ≥ 0.5, then the email is classified as spam, otherwise it is classified as non-spam.
The decision boundary can be linear or non-linear. The order of the polynomial can be increased to get a complex decision boundary.
Loss function (cost function)
Figure 4. Logistic regression loss function
Why is the loss function that was used for linear regression not applicable for logistic regression? Linear regression uses the standard error as the cost function. If you use it for logistic regression, then the parameter function (tetha) will be non-convex, and gradient descent converges to the global minimum only in the case of a convex function.
Figure 5. Convex and non-convex loss functions
Explanation of the loss function
Figure 6. Loss function: part 1
Figure 7. Loss function: part 2
Simplified loss function
Figure 8. Simplified loss function
Why does the loss function look like this?
Figure 9. Explanation of the maximum likelihood method: part 1
Figure 10. Explanation of the maximum likelihood method: part 2
This function is negative, because when learning it is necessary to maximize the probability by minimizing the loss function. Reducing the loss will increase the maximum likelihood, assuming that the samples are drawn from an identical independent distribution.
Derivation of the formula for the gradient descent algorithm
Figure 11. Gradient Descent Algorithm: Part 1
Figure 12 Gradient Descent Explained: Part 2
Implementation in Python:
def weightInitialization(n_features):
w = np.zeros((1,n_features))
b = 0
return w,b
def sigmoid_activation(result):
final_result = 1/(1+np.exp(-result))
return final_result
def model_optimize(w, b, X, Y):
m = X.shape[0]
#Prediction
final_result = sigmoid_activation(np.dot(w,X.T)+b)
Y_T = Y.T
cost = (-1/m)*(np.sum((Y_T*np.log(final_result)) + ((1-Y_T)*(np.log(1-final_result)))))
#
#Gradient calculation
dw = (1/m)*(np.dot(X.T, (final_result-Y.T).T))
db = (1/m)*(np.sum(final_result-Y.T))
grads = {"dw": dw, "db": db}
return grads, cost
def model_predict(w, b, X, Y, learning_rate, no_iterations):
costs = []
for i in range(no_iterations):
#
grads, cost = model_optimize(w,b,X,Y)
#
dw = grads["dw"]
db = grads["db"]
#weight update
w = w - (learning_rate * (dw.T))
b = b - (learning_rate * db)
#
if (i % 100 == 0):
costs.append(cost)
#print("Cost after %i iteration is %f" %(i, cost))
#final parameters
coeff = {"w": w, "b": b}
gradient = {"dw": dw, "db": db}
return coeff, gradient, costs
def predict(final_pred, m):
y_pred = np.zeros((1,m))
for i in range(final_pred.shape[1]):
if final_pred[0][i] > 0.5:
y_pred[0][i] = 1
return y_pred
Losses and number of iterations
Figure 13. Cost reduction
The accuracy of training and testing the system is 100%. This implementation refers to binary logistic regression. When using more than 2 classes, Softmax regression must be used.
This tutorial is based on a deep learning course by Professor Andrew Ng.
Here all code.
Data Science and Machine Learning
Python, web development
Mobile development
Java and C#
From basics to depth
As well as