Step Size Selection in Gradient Descent via Line Search

If a function is strongly convex with known parameters, the optimal fixed step size is \frac{2}{\mu + L}, where \mu and L are the smallest and largest Hessian eigenvalues.

In practice, these are rarely known, and functions may be non-convex. Instead, we tune the learning rate manually, use line search, or adaptive methods like AdamW & NAG-GS.

Golden-section search works well for convex functions but lacks guarantees for non-unimodal ones.
Wolfe conditions ensure sufficient function decrease while avoiding too tiny steps.

Line search often reduces function values faster with fewer iterations, despite higher per-step costs.

Line search

Code