[ 机器学习 - 吴恩达 ] Linear regression with one variable | 2-5 Gradient descent intuition
repeat until convergence {
\(\theta_j := \theta_j - \alpha\frac{\partial}{\partial \theta_j}J(\theta_0,\theta_1)\)??\((for\ j = 0\ and\ j = 1\))
}
\(\alpha\): learning rate
If \(\alpha\) is too small, gradient descent can be slow.
If \(\alpha\) is too large, gradient descent can overshoot (越过) the minimum. It may fail to converge, or even diverge.
Gradient descent can converge to a local minimum, even with the learning rate \(\alpha\) fixed (固定的).
这是因为:
As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease \(\alpha\) over time.