Next: About this document ... Up: No Title Previous: No Title

Conjugate Gradient Method

The conjugate gradient method is based on the idea that the convergence to the solution could be accelerated if we minimize Q over the hyperplane that contains all previous search directions, instead of minimizing Q over just the line that points down gradient. To determine xⁱ⁺¹ we minimize Q over

x⁰ + span(p⁰,p¹,p²,...,pⁱ)

where the p^k represent previous search directions. An added advantage to this approach is that, if we can select the p^k to be linearly independent, then the dimension of the hyperplane

x⁰ + span(p⁰,p¹,p²,...,pⁱ)

will grow one dimension with each iteration of the conjugate gradient method. This would imply that (assuming infinite precision arithmetic) the solution of the linear system Ax=b would be obtained in no more than N steps, where N is the number of unknowns in the system.

Let x⁰ be an initial estimate to the solution of Ax=b. For our first search direction we proceed down a Q-gradient and choose

$\begin{displaymath} x^{1} = x^{0} + \alpha_{0} p^{0} \end{displaymath}$

where

$\begin{displaymath} p^{0} = r^{0} = - \nabla Q(x^{0}) = b - Ax\end{displaymath}$

From the discussion of the method of steepest descent, we have

$\begin{displaymath} \alpha_{0} = {{r^{0} \cdot r^{0}} \over {r^{0} \cdot Ar^{0}}} = {{r^{0} \cdot p^{0} } \over {p^{0} \cdot Ap^{0}}} \end{displaymath}$

To understand what follows in the description of the conjugate gradient method, it is important to note that

$\begin{displaymath} r^{1} \cdot r^{0} = r^{1} \cdot p^{0} = 0\end{displaymath}$

Rather than try to establish the above orthogonality relationships with a calculation, use the following calculus argument. By definition, r¹ is the gradient of Q at x¹, where x¹ is the conjugate gradient estimate to follow the initial guess x⁰ . Also, r⁰ = p⁰ is the search direction along the line $x^{0} + \alpha_{0} p^{0}$ . Calculus tells us that the gradient of Q at x¹ must be orthogonal to the search direction.

Not as a proof of the last statement about calculus and orthogonality, but to motivate the statement, consider the layers of an onion to be surfaces of Q = constant. Imagine piercing the onion with a skewer. In general, the skewer will pass through several outer layers of the onion, tangentially touch one of the inner layers, again pass through the other layers and then exit. The innermost layer that the skewer touches tangentially is given by $\nabla Q(x^{1}) = x^{1}$ and the direction of the skewer is r⁰ = p0.

What does your mental image of this skewered onion tell you about the orientation of r¹ and r⁰ = p⁰?

The conjugate gradient method calls for defining successive approximates by

$\begin{displaymath} x^{i+1} = x^{i} + \alpha_{i} p^{i} \end{displaymath}$

$\begin{displaymath} p^{i+1} = r^{i+1} + \beta_{i} p^{i} \end{displaymath}$

with the scheme for choosing $\alpha_{i}$ and $\beta_{i}$ to be discussed next. The things to keep in mind when choosing $\alpha_{i}$ and $\beta_{i}$ are:

1.We want the span of the search directions to fill the space we are searching as the number of iterations increases;

2.Searching down Q-gradients was basically a good idea. But, to guarantee linearly independent successive search directions, we generally need to choose conjugate gradient search directions to be perturbations of steepest descent search directions.

We have already seen how to define p⁰ and $\alpha_{0}$ , so x¹ and r¹ = b - Ax¹ can be considered known. To take the next step using the conjugate gradient method, we must determine values for $\alpha_{0}$ and $\beta_{1}$ so that we can calculate p¹ and x². Then we will see a pattern emerge.

In taking this next step in the conjugate gradient method we are seeking to minimize Q over the plane

x⁰ + span(p⁰,p¹)

This means that the residual r² will have to be orthogonal to both p⁰ and p¹. (To support this last claim, imagine slicing through a lobe of an onion with a knife, forming a flat area spanned by vectors p⁰ and p¹.)

The orthogonality condition $p^{0} \cdot r^{2} = 0$ will help us set the search direction p¹.

$\begin{displaymath} p^{0} \cdot r^{2} = p^{0} \cdot \left[ b - A \left( x^{1} \right) \right] = p^{0} \cdot r^{1} - \alpha_{1} p^{0} \cdot Ap^{1}\end{displaymath}$

which is zero provided

$\begin{displaymath} \alpha_{1} p^{0} \cdot Ap^{1} = 0\end{displaymath}$

/noindent Definition: Two vectors u and v are said to be A-conjugate if $u \cdot Av = 0$ .

From the requirement that $p^{0} \cdot r^{2} = 0$ , it follows that the search direction p¹ must be A-conjugate to the search direction p⁰. We can now set $\beta_{0}$ as follows:

$\begin{displaymath} p^{1} = r^{1} + \beta_{0} p^{0} \end{displaymath}$

implies

$\begin{displaymath} Ap^{1} = Ar^{1} + \beta_{0} Ap^{0}\end{displaymath}$

Now

$\begin{displaymath} 0 = p^{0} \cdot Ap^{1} = p^{0} \cdot Ar^{1} + \beta_{0} p^{0} \cdot Ap^{0}\end{displaymath}$

implies

$\begin{displaymath} \beta_{0} = - {p^{0} \cdot Ar^{1} \over p^{0} \cdot Ap^{0}} = - {r^{1} \cdot Ar^{1} \over p^{0} \cdot Ap^{0}}\end{displaymath}$

Having decided to proceed from x¹ to x² along the search direction defined by $p^{1} = r^{1} + \beta_{0} p^{0}$ , the same calculus argument used to determine gives

$\begin{displaymath} \alpha_{1} = {{r^{1} \cdot p^{1}} \over {p^{1} \cdot Ap^{1}}}\end{displaymath}$

so a step of the conjugate gradient method is complete. Having developed the CG method for one step it is easy to see that successive iterates are defined as follows.

Conjugate Gradient Algorithm (for A symmetric and positive definite)

Step 1. (initialize)
Choose
x⁰

Set
i = 0 and imax = maximum number of iteration to be performed
r⁰ = b - Ax⁰
p⁰ = r⁰

Step 2: (Begin CG iteration)
If i < imax, Perform Steps 3 - 5
If i = imax, Exit and print message "i = imax"

Step 3: (Perform CG update)
Set
$\alpha_{i} = {{r^{i} \cdot p^{i}} \over {p^{i} \cdot Ap^{i}}}$
$x^{i+1} = x^{i} + \alpha_{i} p^{i}$
$r^{i+1} = r^{i} - \alpha_{i} Ap^{i}$

Step 4: (check for convergence)
If $\Vert[ r^{i+1} \Vert < tolerance$ go to Step 6.

Step 5: (Prepare for next CG update)
Set
xⁱ = xⁱ⁺¹
rⁱ = rⁱ⁺¹
$\beta_i = - {{r^{i} \cdot Ar^{i}} \over {p^{i} \cdot Ap^{i}}}$
$p^{i} = r^{i} + \beta_{i} p^{i}$
i = i +1
Go to Step 3

Step 6:
print solution and residual norm

Next: About this document ... Up: No Title Previous: No Title

Dave Zachmann
2/5/1999