## Summary

A classical problem of function minimization is considered.

• The bottleneck (for almost all gradient methods) is choosing step-size, which can lead to the dramatic difference in method’s behaviour.
• One of the theoretical suggestions: choosing stepsize inversly proportional to the gradient Lipschitz constant $\eta_k = \dfrac{1}{L}$
• In huge-scale applications the cost of iteration is usually defined by the cost of gradient calculation (at least $\mathcal{O}(p)$)
• If function has Lipschitz-continious gradient, then method could be rewritten as follows:

## Bounds

Conditions $\Vert f(x_k) - f(x^*)\Vert \leq$ Type of convergence $\Vert x_k - x^* \Vert \leq$
Convex
Lipschitz-continious function($G$)
$\mathcal{O}\left(\dfrac{1}{k} \right) \; \dfrac{GR}{k}$ Sublinear
Convex
Lipschitz-continious gradient ($L$)
$\mathcal{O}\left(\dfrac{1}{k} \right) \; \dfrac{LR^2}{k}$ Sublinear
$\mu$-Strongly convex
Lipschitz-continious hessian($M$)
Locally linear
$R < \overline{R}$
$\dfrac{\overline{R}R}{\overline{R} - R} \left( 1 - \dfrac{2\mu}{L+3\mu}\right)$
• $R = \| x_0 - x^*\|$ - initial distance
• $\overline{R} = \dfrac{2\mu}{M}$