From Image to JPEG file

How JPEG compression works

Clay Shonkwiler

University of Pennsylvania

Digital image basics

Images are stored as collections of pixels

Note: 8 bits per color channel allows for 256 × 256 × 256 = 224 ≈ 16.7 million colors. Since the human eye is only capable of distinguishing about 10 million colors, this is called "true color". The 24 bits per pixel, 8 bits per color channel is common, but not universal. Most image formats allow for fewer bits per pixel, but since modern monitors are capable of displaying millions of colors and hard drives are so much bigger than even a few years ago, taking advantage of this capability is falling out of favor. Also, several image formats allow more bits per pixel, which can be important in applications like medical imaging.

Another important modification to the traditional 24-bits-per-pixel setup is that that several image formats add a fourth channel to the traditional red, green and blue color channels called the alpha channel, which measures the transparency of a given pixel. This bumps each pixel up to, typically, 32 bits or 4 bytes per pixel, which can be easier for the computer to work with, anyway. Many modern operating systems, perhaps most notably Mac OS X and the forthcoming Longhorn version of Windows, make extensive use of transparency in dropdown menus, background windows, etc. User beware, though! Sometimes what appears to be transparency is faked: an excellent (but, alas, completely overwhelmed by iTunes in recent years) audio program for the Mac called Audion completely faked the transparency of their control windows by keeping track of what was supposed to be behind the window and then, like a chameleon, altering the appearance of the window to make it look like it was transparent. For more, check out The True Story of Audion.

Why do we need compression?

In an uncompressed state, images are quite large. In the below example, the image is 232 × 309 pixels. This means the uncompressed image on the left uses 232 × 309 = 71,688 pixels at 24 bits per pixel, or 71,688 × 24 = 1,720,512 bits: 210 kB! This is a lot of hard drive space for such a small image. The image on the right is compressed using the JPEG algorithm: its size is 44 kB.

uncompressed image compressed image

By the way, I apologize to Red Sox fans for the image. It turns out this is the only bitmap image on my hard drive of any size and quality. And, for obvious reasons, bitmaps are hard to find on the web.

Numerical representation

Example image Matrix of example image

A computer stores an image as a bunch of numbers. The matrix on the right records the brightness of each pixel in the image. For a color image, there would be three such matrices, one each for red, blue and green.

Technically speaking, there’s an additional step in JPEG compression before we move to the next slide. For JPEG, we convert from RGB to a different color space called YCbCr, where Y records brightness and Cb and Cr together represent chrominance. Since the human eye is much better at discerning small differences in brightness than in color, the chrominance matrices are often (but not always) “downsampled” by assigning each 4-pixel square the average of the values of each individual pixel in that square. This immediately compresses the image by a factor of 2 without much visual impact. The following steps are applied to each of the Y, Cb and Cr matrices, but remember that the Cb and Cr matrices are smaller because of the downsampling.

Break it up into chunks

New matrix

Now what?

A few of the “standard” landscapes

The standard landscapes       The standard landscapes as pixels
Abstract view                                   In pixels

The Discrete Cosine Transformation

The Discrete Cosine Transformation (DCT) measures how much of each “standard” landscape makes up a given 8 × 8 pixel block. Applying the DCT to our image (i.e., the matrix) gives a new matrix, where each entry measures how much each of the “standard” landscapes contribute to the image:

Matrix after the DCT

Notice that most of the numbers in the bottom right are quite small. These correspond to the bumpy landscapes.

For those who’ve heard of such things, the Discrete Cosine Transformation is essentially the real part of a discrete Fourier transform. What I’m calling the “standard” landscapes are nothing more than a useful orthogonal basis for the space of 64-dimensional vectors (any 8 × 8 matrix is just an ordered set of 64 numbers and so can be thought of as a 64-dimensional vector) and the Discrete Cosine Transformation is just telling me how the 64-vector given by the matrix can be written as a linear combination of these basis vectors.

Quantization

Quantization matrix

I don’t know this for a fact, but I imagine the JPEG people arrived at these numbers by empirical testing. That is, they probably tried a whole bunch of different quantization matrices and found that this one (and others that are commonly used) did a pretty good job of maintining the visual integrity of the compressed image.

What does quantization do?

Matrix after DCT “÷” Quantization matrix

= Quantized matrix

The matrix on the left, remember, is what we got after applying the DCT; the matrix on the top right is our quantization matrix. Notice how the whole bottom-right of the result is all zeroes.

A matrix is just a string of numbers...

Zig-zag encoding of a matrix

Decompressing

To get an image back from a JPEG file, you essentially just run all the steps in reverse. JPEG is a lossy compression format, so what you get back isn’t exactly the same as what you started with, but it should be close. In the example we’ve been using, here’s the original image and the decompressed image. They’re similar, but not identical:

Example image Decompressed example image

The difference between original and decompressed image above is especially noticable in the bottom left corner. In the original, the bottom left pixel is lighter than the pixel to its immediate right, and the same holds for the two pairs directly above. In the decompressed image, the right pixel in the bottom two pair is lighter and in the third, both pixels are the same.

Artifacts

If we try to compress an image too much, we introduce visible artifacts, as you can see in the below versions of a picture of a donkey:

Low quality donkey Medium quality donkey High quality donkey

Image sizes from left to right: 1.7 kB, 5.7 kB, 36 kB (10, 50 and 100% quality)

Note: There are also lossless compression algorithms, the most common being PNG, which stands for Portable Network Graphics (though, in typically geeky Linux fashion, originally it stood for PNG’s not GIF; PNG was created as a substitute for the patent-encumbered GIF format). The good thing about a lossless format like PNG is that compression doesn’t introduce artifacts. This is especially important for images with sharp contrast and color transitions, like line drawings or blueprints. It’s also very important for images that will be manipulated or edited. In general, it’s a good idea to edit images stored in a lossless format like PNG and only convert to a lossy format like JPEG when editing is complete. Otherwise, irreversible artifacts may corrupt the image in the process.

The downside to lossless compression is that there’s generally much less flexibility in terms of specifying how much you want to compress, which can be important (e.g., for many applications, the middle image above might be more than adequate, especially at 1/6 the size of the right image). Also, for most photographs, PNG and high-quality JPEG versions are visually indistinguishable, yet JPEG will almost always have a significantly smaller filesize. A non-compression-related advantage to PNG files is that the PNG format allows for many gradations of transparency, so that, for example, the same image can be seemlessly placed over many different background colors without modification.

Credits

Images: Presentation Format: