3/9/17

Classification - Binary

[Foreword]

[Ex]
Email: spam / not spam
Online transaction: fraudulent
Tumor: Malignant / Benign

y $\in$ {0, 1}

0: negative class ex: not spam, Benign tumor
1: positive class ex: spam, Malignant tumor

[Linear regression is not good]
Assume this is the hypothesis: $h_{\theta}(x) = \theta^T x$
then we set 0.5 as the threshold, that means

$h_{\theta}(x) \geq 0.5$, predict y = 1
$h_{\theta}(x) < 0.5$, predict y = 0

Here is the schematic diagram
[Plot 1]


When our data obtains the red point only, this threshold works well.
But when adding a purple point around there, the fitted line will get more flatten than before and it's not a good prediction.
Another funny thing is when we using linear regression for a classication problem, the hypothesis may output a value that are much larger or less than 1 and 0, even if the training example y only have 0 and 1.
So, this is the reason we will use another method which is called : Logistic regression.

[Logistic regression]
Because we want $0 \leq h_{\theta}(x) \leq 1$, so we using a function g to change the original  $h_{\theta}(x)$

$$\mathbf{h_{\theta}(x) = g(\theta^T x) = \frac{1}{1 + e^{-\theta^T x}}}$$

This formula is called "sigmoid function" or "logistic function"
We can use 0.5 as the threshold again through the sigmoid function, Let's take a loot on this plot
[Plot 2]


When $h_{\theta}(x) \geq 0.5$ means $ z \geq 0$, it'll output y = 1, in contrast $h_{\theta}(x) < 0.5 $ means $ z < 0, y = 0$


[Ex 1]
$h_{\theta}(x) = p( y = 1 | x , \theta) = $ estimate the probability that y = 1 on input x. So, if we input A patient's tumor size into the sigmoid function and get
$h_{\theta}(x) = 0.7 $
Then we can tell A there is 70% chance that this tumor being malignant.

[Ex 2]
Assume our data is below this plot and our $h_{\theta}(x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_1^2 + \theta_4 x_2^2$,
[Plot 3]

we can get a divide circle when choose the fitted $\theta$ by Logistic regression as below plot.
$\theta = \lbrack  -1, 0, 0, 1, 1 \rbrack ^T$
It means this function will predict y = 1 when  $ -1 + x_1^2 + x_2^2 \geq 1$
[Plot 4]

This green circle is called decision boundary, the blue x means y = 1 and red o means y = 0


No comments:

Post a Comment