Starting in the spring 2013, I videotaped the lectures for my MATH 676: Finite element methods in scientific computing course at the KAMU TV studio at Texas A&M. These are lectures on many aspects of scientific computing, software, and the practical aspects of the finite element method, as well as their implementation in the deal.II software library. Support for creating these videos was also provided by the National Science Foundation and the Computational Infrastructure in Geodynamics.
Note 1: In some of the videos, I demonstrate code or user interfaces. If you can't read the text, change the video quality by clicking on the "gear" symbol at the bottom right of the YouTube player.
Note 2: deal.II is an actively developed library, and in the course of this development we occasionally deprecate and remove functionality. In some cases, this implies that we also change tutorial programs, but the nature of videos is that this is not reflected in something that may have been recorded years ago. If in doubt, consult the current version of the tutorial.
Lecture 41.25: Parallelization on a cluster of distributed memory machines — Part 2: Debugging with MPI
When writing parallel programs with MPI, finding bugs is much more difficult than for programs on a single processor because bugs may depend on what other processors do — or, in fact, what other processors may have done in the past. Fixing such bugs is often awkward, time consuming and frustrating. It is not helped by the fact that there are few available open source tools.
This lecture provides an overview of how one approaches debugging MPI programs, and in particular shows some of the most common reasons for things to go wrong in parallel programs and how one would go about finding out what is happening. In particular, I discuss deadlocks and how difficult it sometimes can be to accurately time how long a particular operation may take.
Slides: click here