Starting in the spring 2013, I videotaped the lectures for my MATH 676: Finite element methods in scientific computing course at the KAMU TV studio at Texas A&M. These are lectures on many aspects of scientific computing, software, and the practical aspects of the finite element method, as well as their implementation in the deal.II software library. Support for creating these videos was also provided by the National Science Foundation and the Computational Infrastructure in Geodynamics.

The videos are part of a broader effort to develop a modern way of teaching Computational Science and Engineering (CS&E) courses. If you are interested in adapting our approach, you may be interested in this paper I wrote with a number of education researchers about the structure of such courses and how they work.

Note 1: In some of the videos, I demonstrate code or user interfaces. If you can't read the text, change the video quality by clicking on the "gear" symbol at the bottom right of the YouTube player.

Note 2: deal.II is an actively developed library, and in the course of this development we occasionally deprecate and remove functionality. In some cases, this implies that we also change tutorial programs, but the nature of videos is that this is not reflected in something that may have been recorded years ago. If in doubt, consult the current version of the tutorial.

Lecture 40: Parallelization on a single, shared memory machine

At least on paper, parallelizing a program is simplest when all threads doing work can see the same data structures in their own memory space. This is what happens whenever we use multiple threads forming a single process on a single machine, and we refer to this as "shared memory parallelization" (because the threads share a memory space). This lecture looks at a couple of examples and outlines techniques of parallelization (threads and tasks). It also points out the difficulties one typically encounters (in particular, difficulties matching the number of threads to the number of available processor cores, and the need to synchronize computations on different threads).


Slides: click here