Lecture7 Regularization
Lecture7 Regularization
Overfitting : If we have too many features,the learned hypothesis may fit the training set very well,but fail to generalize to new examples(predict prices on new examples)
Addressing overfitting :
- Reduce numbeer of features
- Manually select which features to keep
- Model selection algorithm
- Regularization
- Keep all the features,but reduce magnitude/values of parameters \(\theta_j\)
- Works well when we have a lot of features,each of which contributes a bit to predicting y
Cost function
\[J(\theta) = \frac{1}{2m}[\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda \sum^n_{j=1}\theta_j^2] \]if \(\lambda\) is set to an extremely large value:
- Algorithm works fine; setting \(\lambda\) to be very large can't hurt it
- Algortihm fails to eliminate overfitting.
- Algorithm results in underfitting.(Fails to fit even training data well).
- Gradient descent will fail to converge.
Regularized linear regression
Gradient descent
Repeat{
\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\ \theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j] \]Normal equation
\[\theta=(X^TX+\lambda\underbrace{\left[ \begin{matrix} 0 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ \end{matrix} \right]})^{-1}X^Ty \\(n+1)*(n+1) \]Regularized logistic regression
Gradient descent
Repeat{
\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\ \theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j] \]