Learning rate helps control the size of each step of Gradient Descent .

When choosing $\alpha$, try $..., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, ...$ (3x increases)