Note: 8 bits per color channel allows for 256 × 256 × 256 = 224 ≈ 16.7 million colors. Since the human eye is only capable of distinguishing about 10 million colors, this is called "true color". The 24 bits per pixel, 8 bits per color channel is common, but not universal. Most image formats allow for fewer bits per pixel, but since modern monitors are capable of displaying millions of colors and hard drives are so much bigger than even a few years ago, taking advantage of this capability is falling out of favor. Also, several image formats allow more bits per pixel, which can be important in applications like medical imaging.
Another important modification to the traditional 24-bits-per-pixel setup is that that several image formats add a fourth channel to the traditional red, green and blue color channels called the alpha channel, which measures the transparency of a given pixel. This bumps each pixel up to, typically, 32 bits or 4 bytes per pixel, which can be easier for the computer to work with, anyway. Many modern operating systems, perhaps most notably Mac OS X and the forthcoming Longhorn version of Windows, make extensive use of transparency in dropdown menus, background windows, etc. User beware, though! Sometimes what appears to be transparency is faked: an excellent (but, alas, completely overwhelmed by iTunes in recent years) audio program for the Mac called Audion completely faked the transparency of their control windows by keeping track of what was supposed to be behind the window and then, like a chameleon, altering the appearance of the window to make it look like it was transparent. For more, check out The True Story of Audion.
In an uncompressed state, images are quite large. In the below example, the image is 232 × 309 pixels. This means the uncompressed image on the left uses 232 × 309 = 71,688 pixels at 24 bits per pixel, or 71,688 × 24 = 1,720,512 bits: 210 kB! This is a lot of hard drive space for such a small image. The image on the right is compressed using the JPEG algorithm: its size is 44 kB.
By the way, I apologize to Red Sox fans for the image. It turns out this is the only bitmap image on my hard drive of any size and quality. And, for obvious reasons, bitmaps are hard to find on the web.
A computer stores an image as a bunch of numbers. The matrix on the right records the brightness of each pixel in the image. For a color image, there would be three such matrices, one each for red, blue and green.
Technically speaking, there’s an additional step in JPEG compression before we move to the next slide. For JPEG, we convert from RGB to a different color space called YCbCr, where Y records brightness and Cb and Cr together represent chrominance. Since the human eye is much better at discerning small differences in brightness than in color, the chrominance matrices are often (but not always) “downsampled” by assigning each 4-pixel square the average of the values of each individual pixel in that square. This immediately compresses the image by a factor of 2 without much visual impact. The following steps are applied to each of the Y, Cb and Cr matrices, but remember that the Cb and Cr matrices are smaller because of the downsampling.

Abstract view In pixels
The Discrete Cosine Transformation (DCT) measures how much of each “standard” landscape makes up a given 8 × 8 pixel block. Applying the DCT to our image (i.e., the matrix) gives a new matrix, where each entry measures how much each of the “standard” landscapes contribute to the image:
Notice that most of the numbers in the bottom right are quite small. These correspond to the bumpy landscapes.
For those who’ve heard of such things, the Discrete Cosine Transformation is essentially the real part of a discrete Fourier transform. What I’m calling the “standard” landscapes are nothing more than a useful orthogonal basis for the space of 64-dimensional vectors (any 8 × 8 matrix is just an ordered set of 64 numbers and so can be thought of as a 64-dimensional vector) and the Discrete Cosine Transformation is just telling me how the 64-vector given by the matrix can be written as a linear combination of these basis vectors.
“÷”
=
The matrix on the left, remember, is what we got after applying the DCT; the matrix on the top right is our quantization matrix. Notice how the whole bottom-right of the result is all zeroes.
This is already much shorter, but we can compress further using Huffman encoding. All these steps, along with some others I haven’t mentioned, allow us to compress an image to one tenth its original size with little to no visible difference.
Note: The first point above is an important one. When we talk about JPEG encoding or many other types of algorithms, we may use matrices, arrays and various other objects to metaphorically describe what’s going on, but in the computer everything is just a big string of 1’s and 0’s. Often this is an annoyance, so we create variable types like array or more complicated objects to help us deal with the string of numbers in a comprehensible way, but in this case it’s actually an advantage that, fundamentally, the computer sees our matrix as just a long line of numbers.
To get an image back from a JPEG file, you essentially just run all the steps in reverse. JPEG is a lossy compression format, so what you get back isn’t exactly the same as what you started with, but it should be close. In the example we’ve been using, here’s the original image and the decompressed image. They’re similar, but not identical:
The difference between original and decompressed image above is especially noticable in the bottom left corner. In the original, the bottom left pixel is lighter than the pixel to its immediate right, and the same holds for the two pairs directly above. In the decompressed image, the right pixel in the bottom two pair is lighter and in the third, both pixels are the same.
If we try to compress an image too much, we introduce visible artifacts, as you can see in the below versions of a picture of a donkey:
Image sizes from left to right: 1.7 kB, 5.7 kB, 36 kB (10, 50 and 100% quality)
Note: There are also lossless compression algorithms, the most common being PNG, which stands for Portable Network Graphics (though, in typically geeky Linux fashion, originally it stood for PNG’s not GIF; PNG was created as a substitute for the patent-encumbered GIF format). The good thing about a lossless format like PNG is that compression doesn’t introduce artifacts. This is especially important for images with sharp contrast and color transitions, like line drawings or blueprints. It’s also very important for images that will be manipulated or edited. In general, it’s a good idea to edit images stored in a lossless format like PNG and only convert to a lossy format like JPEG when editing is complete. Otherwise, irreversible artifacts may corrupt the image in the process.
The downside to lossless compression is that there’s generally much less flexibility in terms of specifying how much you want to compress, which can be important (e.g., for many applications, the middle image above might be more than adequate, especially at 1/6 the size of the right image). Also, for most photographs, PNG and high-quality JPEG versions are visually indistinguishable, yet JPEG will almost always have a significantly smaller filesize. A non-compression-related advantage to PNG files is that the PNG format allows for many gradations of transparency, so that, for example, the same image can be seemlessly placed over many different background colors without modification.