[Def]
J(θ)=1mm∑i=1Cost(hθ(x(i)),y(i))Cost(hθ(x),y)=−log(hθ(x))if y = 1Cost(hθ(x),y)=−log(1−hθ(x))if y = 0
Let's see the plot to get more clearly, first plot is y = 1
[Plot 1]
hθ(x)=1 when y = 1, then the cost J = 0, but if hθ(x)=0 then the cost →∞, for a further explanation, if hθ(x)=0 or P(y=1|x,θ)=0, but y = 1, we'll penalize the learning algorithm by a very large cost.
When y = 0, the plot is as below
[Plot 2]
Similarly, −log(1−hθ(x)) will penalize the learning algorithm, if the result is opposite to the actual.
It's useful to use log here, let's take a look on this plot
[Plot 3]
It will stuck at local minimum when using square function in linear regression since it's non-convex, but when transfer by log in logistic regression, it'll guarantee to find the global minimum it's convex.
To combine this two case in one formula, we can get this equation:
[Def]
Cost(hθ(x),y)=−ylog(hθ(x))−(1−y)log(1−hθ(x))
and the whole cost function is
J(θ)=−1mm∑i=1[y(i)log(hθ(x(i)))+(1−y(i))log(1−hθ(x(i)))]
The next step is try to find the minimum of the cost function J with gradient descent which we had introduced before gradient-descent-for-multiple-variables, notice that it's needed updating simultaneously.
No comments:
Post a Comment