3/12/17

Logistic regression model

[Cost function in logistic regression]

[Def]
\begin{align*}& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0}\end{align*}

Let's see the plot to get more clearly, first plot is y = 1
[Plot 1]
$h_{\theta}(x) = 1$ when y = 1, then the cost J = 0, but if $h_{\theta}(x) = 0$ then the cost $\rightarrow \infty$, for a further explanation,  if $h_{\theta}(x) = 0$ or $P( y = 1 | x, \theta) = 0$, but y = 1, we'll penalize the learning algorithm by a very large cost.

When y = 0, the plot is as below
[Plot 2]
Similarly, $-log(1 - h_\theta(x))$ will penalize the learning algorithm, if the result is opposite to the actual.

It's useful to use log here, let's take a look on this plot
[Plot 3]
It will stuck at local minimum when using square function in linear regression since it's non-convex, but when transfer by log in logistic regression, it'll guarantee to find the global minimum it's convex.

To combine this two case in one formula, we can get this equation:
[Def]
$$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$$
and the whole cost function is
$$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$$

The next step is try to find the minimum of the cost function J with gradient descent which we had introduced before gradient-descent-for-multiple-variables, notice that it's needed updating simultaneously.









No comments:

Post a Comment