Lecture7 Regularization

Overfitting : If we have too many features,the learned hypothesis may fit the training set very well,but fail to generalize to new examples(predict prices on new examples)

Addressing overfitting :

Reduce numbeer of features
- Manually select which features to keep
- Model selection algorithm
Regularization
- Keep all the features,but reduce magnitude/values of parameters \(\theta_j\)
- Works well when we have a lot of features,each of which contributes a bit to predicting y

Cost function

\[J(\theta) = \frac{1}{2m}[\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda \sum^n_{j=1}\theta_j^2] \]

if \(\lambda\) is set to an extremely large value:

Algorithm works fine; setting \(\lambda\) to be very large can't hurt it
Algortihm fails to eliminate overfitting.
Algorithm results in underfitting.(Fails to fit even training data well).
Gradient descent will fail to converge.

Regularized linear regression

Gradient descent

Repeat{

\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)} \\ \theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j] \]

Normal equation

\[\theta=(X^TX+\lambda\underbrace{\left[ \begin{matrix} 0 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ \end{matrix} \right]})^{-1}X^Ty \\(n+1)*(n+1) \]

Regularized logistic regression

Gradient descent

Repeat{

机器学习神经网络吴恩达机器学习系列课程

Lecture7 Regularization

Lecture7 Regularization

Cost function

Regularized linear regression

Regularized logistic regression

相关

[ML]机器学习中我未见过的概念

[机器学习笔记(一)] TensorFLow安装

神经网络 - Inception 16

机器学习 - 线性回归模型实战 02

神经网络基本组成 - 激活函数层 12

机器学习-支持向量机SVM

TensorFlow——机器学习编程框架

机器学习（三、神经网络）

吴恩达机器学习作业2- 逻辑回归与正则化作业（python实现）

[ 机器学习 - 吴恩达 ] | 1-2 What is machine learning

神经网络可视化工具

《神经网络与机器学习》第8章泛化与正则化

标签