Intuition

For the classic task of unconditional optimization the general scheme of iteration method is written as:

In the Newton method, the direction (Newton’s direction) is set by the linear system solution at each step:

i.e. at each iteration it is necessary to compensate hessian and gradient and resolve linear system.

Note here that if we take a single matrix of as at each step, we will exactly get the gradient descent method.

The general scheme of quasi-Newton methods is based on the selection of the matrix so that it tends in some sense at to the true value of inverted Hessian in the local optimum . Let’s consider several schemes using iterative updating of $B_k$ matrix in the following way:

Then if we use Taylor’s approximation for the first order gradient, we get it:

Now let’s formulate our method as:

in case you set the task of finding an update $\Delta B_k$:

Broyden method

The simplest option is when the amendment $\Delta B_k$ has a rank equal to one. Then you can look for an amendment in the form

where $\mu_k$ is a scalar and $q_k$ is a non-zero vector. Then mark the right side of the equation to find $\Delta B_k$ for $\Delta z_k$:

We get it:

A possible solution is: $q_k = \Delta z_k$, $\mu_k = \left(q_k^\top \Delta y_k\right)^{-1}$.

Then an iterative amendment to Hessian’s evaluation at each iteration:

Davidon–Fletcher–Powell method

Broyden–Fletcher–Goldfarb–Shanno method

Comparison of quasi Newton methods

Link