Inexact Line Search

The strategy of inexact line search is practical and has a significant geometric interpretation:

1 Sufficient Decrease

Consider a scalar function \phi(\alpha) at a point x_k:

\phi(\alpha) = f(x_k - \alpha\nabla f(x_k)), \alpha \geq 0

The first-order approximation of \phi(\alpha) near \alpha = 0 is:

\phi(\alpha) \approx f(x_k) - \alpha\nabla f(x_k)^\top \nabla f(x_k)

Illustration of Taylor approximation of \phi^I_0(\alpha)

The inexact line search condition, known as the Armijo condition, states that \alpha should provide sufficient decrease in the function f, satisfying:

f(x_k - \alpha \nabla f (x_k)) \leq f(x_k) - c_1 \cdot \alpha\nabla f(x_k)^\top \nabla f(x_k)

Illustration of sufficient decrease condition with coefficient c_1

for some constant c_1 \in (0,1). Note that setting c_1 = 1 corresponds to the first-order Taylor approximation of \phi(\alpha). However, this condition can accept very small values of \alpha, potentially slowing down the solution process. Typically, c_1 \approx 10^{−4} is used in practice.

Example

If f(x) represents a cost function in an optimization problem, choosing an appropriate c_1 value is crucial. For instance, in a machine learning model training scenario, an improper c_1 might lead to either very slow convergence or missing the minimum.

Question

How does the choice of c_1 affect the convergence speed in optimization problems?

2 Goldstein Conditions

Consider two linear scalar functions \phi_1(\alpha) and \phi_2(\alpha):

\phi_1(\alpha) = f(x_k) - c_1 \alpha \|\nabla f(x_k)\|^2

\phi_2(\alpha) = f(x_k) - c_2 \alpha \|\nabla f(x_k)\|^2

The Goldstein-Armijo conditions locate the function \phi(\alpha) between \phi_1(\alpha) and \phi_2(\alpha). Typically, c_1 = \rho and c_2 = 1 - \rho, with $ (0.5, 1)$.

Illustration of Goldstein conditions

3 Curvature Condition

To avoid excessively short steps, we introduce a second criterion:

-\nabla f (x_k - \alpha \nabla f(x_k))^\top \nabla f(x_k) \geq c_2 \nabla f(x_k)^\top(- \nabla f(x_k))

Illustration of curvature condition

for some c_2 \in (c_1,1). Here, c_1 is from the Armijo condition. The left-hand side is the derivative \nabla_\alpha \phi(\alpha), ensuring that the slope of \phi(\alpha) at the target point is at least c_2 times the initial slope \nabla_\alpha \phi(\alpha)(0). Commonly, c_2 \approx 0.9 is used for Newton or quasi-Newton methods. Together, the sufficient decrease and curvature conditions form the Wolfe conditions.

Illustration of Wolfe condition
Example

In gradient descent algorithms, applying the curvature condition can prevent the algorithm from taking steps that are too small, thus enhancing the efficiency of finding the minimum.

Question

Why is it important to have a balance between the sufficient decrease and curvature conditions in optimization algorithms?

5 References