# Logistic Regression: A Detailed Overview

Figure 1. Logistic regression model. A source.

Logistic regression has been used in the field of biological research since the early twentieth century. Then it began to be used in many social sciences. Logistic regression is applicable when the dependent variable (target value) is categorical. Details in the illustrations – in the material, and practice – in our Data Science course.

For example, we need to predict:

Let’s try to answer the second question.

*Model*

- The output of the model is 0 or 1.
- Hypothesis => Z = WX + B.
- hΘ(x) = sigmoid(Z).

**Sigmoid function**

*Figure 2. Sigmoid activation function*

The predicted value of Y(predicted) tends to 1 for positively infinite Z, and to 0 for negatively infinite Z.

*Hypothesis analysis*

With this hypothesis, the output is a probability estimate. It determines the confidence that the estimated value will match the actual value. Input – X values_{0} and X_{1}. Based on x value_{1} we can estimate the probability as 0.8. Thus, the probability that the tumor is malignant is 80%.

You can write it like this:

*Figure 3. Mathematical representation*

This is the reason for the name – logistic regression. The data fit into a linear regression model, which is then acted upon by a logistic function. This function describes the target categorical dependent variable.

*Types of logistic regression*

1. Binary logistic regression.

Classification in just 2 possible categories, for example spam.

2. Multinomial logistic regression.

These are three or more categories without ranking, for example, the definition of the most popular food system (vegetarian, non-vegetarian, vegan).

3. Ordinal logistic regression

Ranking in three or more categories – ranking films with ratings from 1 to 5

*Decision Boundary*

To determine which class the data belongs to, a threshold value is set. Based on this threshold, the resulting probability estimate is classified into classes.

For example, if predicted_value ≥ 0.5, then the email is classified as spam, otherwise it is classified as non-spam.

The decision boundary can be linear or non-linear. The order of the polynomial can be increased to get a complex decision boundary.

*Loss function (cost function)*

*Figure 4. Logistic regression loss function*

Why is the loss function that was used for linear regression not applicable for logistic regression? Linear regression uses the standard error as the cost function. If you use it for logistic regression, then the parameter function (tetha) will be non-convex, and gradient descent converges to the global minimum only in the case of a convex function.

*Figure 5. Convex and non-convex loss functions*

*Explanation of the loss function*

*Figure 6. Loss function: part 1*

*Figure 7. Loss function: part 2*

*Simplified loss function*

*Figure 8. Simplified loss function*

*Why does the loss function look like this?*

*Figure 9. Explanation of the maximum likelihood method: part 1*

*Figure 10. Explanation of the maximum likelihood method: part 2*

This function is negative, because when learning it is necessary to maximize the probability by minimizing the loss function. Reducing the loss will increase the maximum likelihood, assuming that the samples are drawn from an identical independent distribution.

*Derivation of the formula for the gradient descent algorithm*

*Figure 11. Gradient Descent Algorithm: Part 1*

*Figure 12 Gradient Descent Explained: Part 2*

** Implementation in Python**:

```
def weightInitialization(n_features):
w = np.zeros((1,n_features))
b = 0
return w,b
def sigmoid_activation(result):
final_result = 1/(1+np.exp(-result))
return final_result
def model_optimize(w, b, X, Y):
m = X.shape[0]
#Prediction
final_result = sigmoid_activation(np.dot(w,X.T)+b)
Y_T = Y.T
cost = (-1/m)*(np.sum((Y_T*np.log(final_result)) + ((1-Y_T)*(np.log(1-final_result)))))
#
#Gradient calculation
dw = (1/m)*(np.dot(X.T, (final_result-Y.T).T))
db = (1/m)*(np.sum(final_result-Y.T))
grads = {"dw": dw, "db": db}
return grads, cost
def model_predict(w, b, X, Y, learning_rate, no_iterations):
costs = []
for i in range(no_iterations):
#
grads, cost = model_optimize(w,b,X,Y)
#
dw = grads["dw"]
db = grads["db"]
#weight update
w = w - (learning_rate * (dw.T))
b = b - (learning_rate * db)
#
if (i % 100 == 0):
costs.append(cost)
#print("Cost after %i iteration is %f" %(i, cost))
#final parameters
coeff = {"w": w, "b": b}
gradient = {"dw": dw, "db": db}
return coeff, gradient, costs
def predict(final_pred, m):
y_pred = np.zeros((1,m))
for i in range(final_pred.shape[1]):
if final_pred[0][i] > 0.5:
y_pred[0][i] = 1
return y_pred
```

Losses and number of iterations

*Figure 13. Cost reduction*

The accuracy of training and testing the system is 100%. This implementation refers to binary logistic regression. When using more than 2 classes, Softmax regression must be used.

This tutorial is based on a deep learning course by Professor Andrew Ng.

Here all code.

**Brief catalog of courses**

**Data Science and Machine Learning**

**Python, web development**

**Mobile development**

**Java and C#**

**From basics to depth**

**As well as**