Loading [MathJax]/jax/output/HTML-CSS/jax.js

3/16/17

Regularization

Continuing the preceding example

In our high-order polynomial function, it's the same as quadratic function except the θ3 and θ4 terms. Our goal is to minimize this function 12mmi=1(hθ(xi)yi)2, so if we decrease the influence from θ3 and θ4 in the cost function, just add some large number multiply the parameters,
12mmi=1(hθ(xi)yi)2+10000θ3+10000θ4
then these parameters will become very close to 0, we make this high-order polynomial function approach to the fitted well quadratic function and less prone to overfit.


[In linear regression]

[Def ]
minθ 12m [mi=1(hθ(x(i))y(i))2+λ nj=1θ2j]

λ is the regularization parameter, it decide how much these parameters are inflated.

[Gradient descent with regularization]
Repeat {    θ0:=θ0α 1m mi=1(hθ(x(i))y(i))x(i)0    θj:=θjα [(1m mi=1(hθ(x(i))y(i))x(i)j)+λmθj]          j{1,2...n}}

We separate this formula to get more intuition on this equation.
θj:=θj(1αλm)α1mmi=1(hθ(x(i))y(i))x(i)j

This equation can separate by two part, α1m is greater then 0 in the first part, it means it''ll decrease θ a little bit, and the second part is the same with non-regularization.

[Normal equation with regularization]
θ=(XTX+λL)1XTywhere  L=[0111]

It need notice that the matrix whose upper left entry is zero, since the intercept doesn't need regularization.The other advantage is in the m < n case, this normal equation in linear regression with regularization will be invertible, we have mentioned on this post normal-equation before, it is non-invertible or sigular when not added the regularization term in the m < n case.


[In logistic regression]

[Def]
J(θ)=1mmi=1[y(i) log(hθ(x(i)))+(1y(i)) log(1hθ(x(i)))]+λ2mnj=1θ2j

[Gradient descent with regularization]
The gradient descent form is the same with linear regression, but the hypothesis is different.

No comments:

Post a Comment