Starting in the spring 2013, I videotaped the lectures for my MATH 676: Finite element methods in scientific computing course at the KAMU TV studio at Texas A&M. These are lectures on many aspects of scientific computing, software, and the practical aspects of the finite element method, as well as their implementation in the deal.II software library. Support for creating these videos was also provided by the National Science Foundation and the Computational Infrastructure in Geodynamics.

The videos are part of a broader effort to develop a modern way of teaching Computational Science and Engineering (CS&E) courses. If you are interested in adapting our approach, you may be interested in this paper I wrote with a number of education researchers about the structure of such courses and how they work.

Note 1: In some of the videos, I demonstrate code or user interfaces. If you can't read the text, change the video quality by clicking on the "gear" symbol at the bottom right of the YouTube player.

Note 2: deal.II is an actively developed library, and in the course of this development we occasionally deprecate and remove functionality. In some cases, this implies that we also change tutorial programs, but the nature of videos is that this is not reflected in something that may have been recorded years ago. If in doubt, consult the current version of the tutorial.

Lecture 34: What solver to use

Ultimately, every finite element code ends up with one or more linear systems that need to be solved. There are two basic strategies for this: direct solvers and iterative solvers. This class discusses the tradeoffs between the two and when to use which kind, as well as which iterative solver.

In practice, iterative solvers are only competitive for moderate and large problems (say, more than 100,000 unknowns) and only if an appropriate preconditioner is available. The question of preconditioners will be discussed in the next lecture.

Note 1: There was a typo in the memory complexity of direct solvers. In the lecture, it says O(N4/3) when the general formula above clearly (and correctly) would lead to a complexity of O(N5/3). This has been corrected in the slides posted below.

However, it is worth saying more about this. The complexities shown on this slide are based on band matrix solvers. To understand what this means, it is not very difficult to show that an algorithm like Gauss elimination (the basis of what direct solvers do) will only produce non-zero entries in the LU decomposition for entries that lie between the left-most and the right-most entries of the original matrix A. As a consequence, a strategy to minimize the amount of work done by direct solvers is to re-order (permute) the rows and column of a matrix in such a way that it minimizes the bandwidth b of A. (The bandwidth is defined as b=maxi maxj, Aij≠0 |i-j|, i.e. the maximal distance of any nonzero element of A from the diagonal.) It can then be shown that a direct solver may produce at most O(bN) nonzero entries in the LU decomposition, and will need at most O(b2N) operations.

What one now needs to reason on is how the (minimal) bandwidth relates to a finite element mesh. Imagine a 2d, square mesh consisting of N=m*m mesh points. Then, if you had a 5-point stencil, every point would couple with its left and right, top and bottom neighbors. If you enumerate degrees of freedom by row, then the maximal index distance between degrees of freedom is m: the index difference between a degree of freedom and its top or bottom neighbor. So b=m=N1/2. In 3d, where N=m*m*m, one finds b=m2=N2/3. These considerations lead to the complexities outlined on the slides of the lecture.

Note 2: After explaining all this, it is also worth pointing out that it is possible to refine these estimates. Furthermore, one can choose orderings that do not minimize the bandwidth of the matrix, but that minimize the fill-in of nonzero entries in the LU factors. If you do this, you get the following complexities:

  • In 2d: O(N3/2) work and O(N logN) memory
  • In 3d: O(N2) work and O(N4/3) memory
A good reference to get an overview of this is Chapter 2 of Jack Poulson's PhD thesis.


Slides: click here