Introduction
Originally, the conjugate gradients method was created to solve a system of linear equations.
Without special efforts the problem can be presented in the form of minimization of the quadratic function, and then generalized on a case of non quadratic function. We will start with the parabolic case and try to construct a conjugate gradients method for it. Let us consider the classical problem of minimization of the quadratic function:
Here , .
Method of conjugate gradients for the quadratic function
We will consider symmetric matrices (otherwise, replacing leads to the same optimization problem). Then:
Then having an initial guess , vector is the direction of the fastest decrease. The procedure of the steepest descent in this direction is provided by the procedure of line search:
Assuming that the point of the zero derivative in this parabola is the minimum (for positive matrices it is guaranteed, otherwise it is not a fact), and also, rewriting this problem for the arbitrary () direction of the method, we have:
Then let’s start our method, as the method of the steepest descent:
Note, however, that if the next step is built in the same way (the fastest descent), we will “lose” some of the work that was done in the first step and we will get a classic situation for the fastest descent:
In order to avoid this, we introduce the concept of  conjugated vectors: let’s say that two vectors ,  are conjugated relative to each other if they are executed:
This concept becomes particularly interesting when matrix is positive defined, then vectors will be orthogonal if the scalar product is defined by the matrix . Therefore, this property is also called  orthogonality.
Then we will build the method in such a way that the next direction is  orthogonal with the previous one:
where is selected in a way that :
It’s interesting that all received directions are  orthogonal to each other. (proved by induction)
Thus, we formulate an algorithm:

Let and , count .

By the procedure of line search we find the optimal length of step : Calculate minimizing by the formula
 We’re doing an algorithm step:
 update the direction: , where is calculated by the formula:
 Repeat steps 24 until directions are built, where is the dimension of space (dimension of ).
Method of conjugate gradients for nonquadratic function:
In case we do not have an analytic expression for a function or its gradient, we will most likely not be able to solve the onedimensional minimization problem analytically. Therefore, step 2 of the algorithm is replaced by the usual line search procedure. But there is the following mathematical trick for the fourth point:
For two iterations, it is fair:
where is some kind of constant. Then for the quadratic case, we have:
Expressing from this equation the work , we get rid of the “knowledge” of the function in step definition , then point 4 will be rewritten as:
This method is called the Polack  Ribier method.
Examples
Example 1
Prove that if a set of vectors   are conjugated (all vectors are conjugated in pairs of ), these vectors are linearly independent. .
Solution:
We’ll show, that if , than all coefficients should be equal to zero:
Thus, , for all other indices one have perform the same process
References
 An Introduction to the Conjugate Gradient Method Without the Agonizing Pain
 The Concept of Conjugate Gradient Descent in Python by Ilya Kuzovkin
 Picture of best\worst initial guess in SD