LOGISTIC REGRESSION INTERPRETATION

5 min readDec 16, 2023

Introduction

In this blog, we will discuss the basic concepts of Logistic Regression and the types of problems that can be solved using logistic regression.

Logistic regression is a classification algorithm that assigns observations to a discrete set of classes. Some of examples of classification problems are Email spam or not spam, Online transactions Fraud or not Fraud, and Tumor Malignant, or Benign. Logistic regression transforms its output using the logistic sigmoid function to return a probability value.

What are the types of logistic regression

Binary (eg. Tumor Malignant or Benign)
Multi-linear functions fail class (eg. Cats, dogs, or Sheep)

Logistic Regression

Logistic Regression is a Machine Learning algorithm that is used for classification problems, it is a predictive analysis algorithm based on the concept of probability.

We can call a Logistic Regression a Linear Regression model but the Logistic Regression uses a more complex cost function, this cost function can be defined as the ‘Sigmoid function’ or the ‘logistic function’ instead of a linear function.

The hypothesis of logistic regression tends it to limit the cost function between 0 and 1. Therefore linear functions fail to represent it as it can have a value greater than 1 or less than 0 which is not possible as per the hypothesis of logistic regression.

Logistic regression hypothesis expectation

What is the Sigmoid Function?

To map predicted values to probabilities, we use the Sigmoid function. The function maps any real value into another value between 0 and 1. In machine learning, we use sigmoid to map predictions to probabilities.

The formula of a sigmoid function | Image: Analytics India Magazine

Hypothesis Representation

When using linear regression we used a formula for the hypothesis i.e.

hΘ(x) = β₀ + β₁X

For logistic regression, we are going to modify it a little bit i.e.

σ(Z) = σ(β₀ + β₁X)

We have expected that our hypothesis will give values between 0 and 1.

Z = β₀ + β₁X
hΘ(x) = sigmoid(Z)
i.e. hΘ(x) = 1/(1 + e^-(β₀ + β₁X)

Decision Boundary

We expect our classifier to give us a set of outputs or classes based on probability when we pass the inputs through a prediction function and return a probability score between 0 and 1.

For Example, We have 2 classes, let’s make them like cats and dogs(1 — dog, 0 — cats). We decide with a threshold value above which we classify values into Class 1 and if the value goes below the threshold then we classify it in Class 2.

As shown in the above graph we have chosen the threshold as 0.5, if the prediction function returned a value of 0.7 then we would classify this observation as Class 1(DOG). If our prediction returned a value of 0.2 then we would classify the observation as Class 2(CAT).

Cost Function

We learned about the cost function J(θ) in the Linear regression, the cost function represents the optimization objective i.e. we create a cost function and minimize it so that we can develop an accurate model with minimum error.

The Cost function of Linear regression

If we try to use the cost function of the linear regression in ‘Logistic Regression’ then it would be of no use as it would end up being a non-convex function with many local minimums, in which it would be very difficult to minimize the cost value and find the global minimum.

For logistic regression, the Cost function is defined as:

−log(hθ(x)) if y = 1
−log(1−hθ(x)) if y = 0

The above two functions can be compressed into a single function i.e.

The above functions are compressed into one cost function

Gradient Descent

Now the question arises, how do we reduce the cost value. Well, this can be done by using Gradient Descent. The main goal of Gradient descent is to minimize the cost value. i.e. min J(θ).

Now to minimize our cost function we need to run the gradient descent function on each parameter i.e.

Objective: To minimize the cost function we have to run the gradient descent function on each parameter

Gradient Descent Simplified | Image: Andrew Ng Course

Gradient descent has an analogy in which we have to imagine ourselves at the top of a mountain valley and left stranded and blindfolded, our objective is to reach the bottom of the hill. Feeling the slope of the terrain around you is what everyone would do. Well, this action is analogous to calculating the gradient descent, and taking a step is analogous to one iteration of the update to the parameters.

Conclusion

In this blog, I have presented you with the basic concept of Logistic Regression. I hope this blog was helpful and would have motivated you enough to get interested in the topic.