Summary

A classical problem of function minimization is considered.

  • The bottleneck (for almost all gradient methods) is choosing step-size, which can lead to the dramatic difference in method’s behaviour.
  • One of the theoretical suggestions: choosing stepsize inversly proportional to the gradient Lipschitz constant
  • In huge-scale applications the cost of iteration is usually defined by the cost of gradient calculation (at least )
  • If function has Lipschitz-continious gradient, then method could be rewritten as follows:

Bounds

Conditions $\Vert f(x_k) - f(x^*)\Vert \leq$ Type of convergence $\Vert x_k - x^* \Vert \leq$
Convex
Lipschitz-continious function($G$)
$\mathcal{O}\left(\dfrac{1}{k} \right) \; \dfrac{GR}{k}$ Sublinear  
Convex
Lipschitz-continious gradient ($L$)
$\mathcal{O}\left(\dfrac{1}{k} \right) \; \dfrac{LR^2}{k}$ Sublinear  
$\mu$-Strongly convex
Lipschitz-continious hessian($M$)
  Locally linear
$R < \overline{R}$
$\dfrac{\overline{R}R}{\overline{R} - R} \left( 1 - \dfrac{2\mu}{L+3\mu}\right)$
  • - initial distance

Materials