3/5/17

Gradient descent in practice - learning rate

[foreword]
When we scaling our variables with mean normalization, and running the gradient descent.
How do we confirm our model is correct?

Let's recall the main job of gradient descent, this method want to minimize the cost function J.
Then if we plot the cost function J as gradient descent runs, we could expect that this plot will decrease gradually if the learning rate is correct.

[Plot 1]

When the plot gets to 300 iterations, it looks like this curve has flattened out here.
So we can judge the gradient descent has converged or not by using this plot.

Sometimes for some application, the gradient descent need a lot times of iteration to take to converge, so another way of converging check is automatic convergence test. This method judge whether the gradient descent is converge or not by setting an $ \epsilon$, maybe $10^{-3}$. But the disadvantage of this method is that usually pretty difficult to choose the threshold $\epsilon$.

If your curve is increasing, the most common reason is the learning rate is too big, we can take a look on [Plot 2] in the preceding article Batch gradient descent, or you'll find the plot like
[Plot 2]

And it usually easy to solve when using a smaller learning rate. So if the learning rate is small enough, the cost function should decrease on every iteration. In contrast, if the learning rate is too small, it'll spent too much time to get converge and this isn't what we want.

To wrap up, we can set some learning rate candidates when using the gradient descent, and choosing the number which is small enough and having the fastest time to converge.

No comments:

Post a Comment