In our high-order polynomial function, it's the same as quadratic function except the θ3 and θ4 terms. Our goal is to minimize this function 12mm∑i=1(hθ(xi)−yi)2, so if we decrease the influence from θ3 and θ4 in the cost function, just add some large number multiply the parameters,
12mm∑i=1(hθ(xi)−yi)2+10000θ3+10000θ4
then these parameters will become very close to 0, we make this high-order polynomial function approach to the fitted well quadratic function and less prone to overfit.
[In linear regression]
[Def ]
minθ 12m [m∑i=1(hθ(x(i))−y(i))2+λ n∑j=1θ2j]
λ is the regularization parameter, it decide how much these parameters are inflated.
[Gradient descent with regularization]
Repeat { θ0:=θ0−α 1m m∑i=1(hθ(x(i))−y(i))x(i)0 θj:=θj−α [(1m m∑i=1(hθ(x(i))−y(i))x(i)j)+λmθj] j∈{1,2...n}}
We separate this formula to get more intuition on this equation.
θj:=θj(1−αλm)−α1mm∑i=1(hθ(x(i))−y(i))x(i)j
This equation can separate by two part, α1m is greater then 0 in the first part, it means it''ll decrease θ a little bit, and the second part is the same with non-regularization.
[Normal equation with regularization]
θ=(XTX+λ⋅L)−1XTywhere L=[011⋱1]
It need notice that the matrix whose upper left entry is zero, since the intercept doesn't need regularization.The other advantage is in the m < n case, this normal equation in linear regression with regularization will be invertible, we have mentioned on this post normal-equation before, it is non-invertible or sigular when not added the regularization term in the m < n case.
[In logistic regression]
[Def]
J(θ)=−1mm∑i=1[y(i) log(hθ(x(i)))+(1−y(i)) log(1−hθ(x(i)))]+λ2mn∑j=1θ2j
[Gradient descent with regularization]
The gradient descent form is the same with linear regression, but the hypothesis is different.
No comments:
Post a Comment