In this article, we will explore Logistic Regression as the starting step towards learning about Neural Network and Machine Learning.
Logistics Regression is mainly used for supervised problems that have binary outputs, either 0 or 1. For example, whether it will rain tomorrow (with 0=not going to rain, and 1=going to rain), or whether your crush will fall in love with you on your first date together (with 0=not going to fall in love and 1=going to fall in love). For the latter example, the output will be tied to a training set of example inputs:
Let’s first see how to construct the logistic regression equation mathematically.
Starting point: Linear Regression
We could start from the linear regression equation:
ŷ = (w.T)x + b ————- (1)
ŷ is your predicted value of the model
w.T is the transposed vector of your parameters w
x are the input features in vector form and colloquially, things that explain y, the dependent variables (for the dating example, whether you are going to buy a gift for the girl, how nicely dressed up you will be, etc.)
b is the bias or error term
Equation (1) would be the archetypal equation for linear regression, but for logistics regression, we cannot use it because from our aforementioned discussion, the output value of a logistics regression equation cannot be less than 0 or more than 1. Using (1) for logistics regression means the output value of ŷ can be unbounded and sometimes falling in the region, <0 or >1, which doesn’t make sense (akin to saying that your crush will fall in love with you 1000x)
We would like an equation that can provide us with values not less than 0 and not more than 1.
Introducing the Sigmoid Function. Let me first show its equation:
and now, equation (1) simply becomes:
ŷ = σ(z) ————- (2)
where z = (w.T)x + b
σ is the sigmoid function applied to z
To better illustrate the function:
What are some advantages of using a Sigmoid Function?
Let’s see if it makes sense using the Sigmoid Function for logistics regression. If we were to take z=(w.T)x + b being very large (maybe 100,000), in the linear regression equation (1), ŷ would be very large. However, in the logistic regression equation (2), e^[-(large number)] would be small and close to 0 so 1/1+e^[-(large number)] ≈ 1.
In the case for z values being large, ŷ will be close to 1 and ŷ will never be more than 1.
Conversely, if z=(w.T)x + b is very small (or very negative), e^ -[very negative number] would be very large so 1/(1+e^ -[very negative number]) ≈ 0, so ŷ will never be less than 0.
In the case for z values being small, ŷ will be close to 0 and ŷ will never be less than 0.
As you can see, the bounded property of the Sigmoid function proves to be useful and relevant for logistics regression purposes.
In Numpy, you can easily create a sigmoid function using:
import numpy as np
s = 1/(1+np.exp(-z))
output: [0.5, 0.88079708]
As you can see, the output values will never be more than 1 or less than 0, with your first date predictions either going to be good or bad.