Learning Rate | Notion

Learning rate helps control the size of each step of Gradient Descent .

When choosing $\alpha$, try $..., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, ...$ (3x increases)

$\alpha$ too small = very slow convergence, non-optimal
$\alpha$ too large = may not converge (may also have slow convergence due to bouncing)