From Image to JPEG file

How JPEG compression works

Clay Shonkwiler

University of Pennsylvania

Digital image basics

A digital camera records an image using a digital sensor that breaks the image up into little squares, called pixels

Images are stored as collections of pixels

For each pixel, the sensor records how much red, green and blue (RGB) light is present on a scale from 0 to 255.
Numbers from 0 to 255 can be encoded into 8 bits, so each pixel takes 24 bits.

Note: 8 bits per color channel allows for 256 × 256 × 256 = 2²⁴ ≈ 16.7 million colors. Since the human eye is only capable of distinguishing about 10 million colors, this is called "true color". The 24 bits per pixel, 8 bits per color channel is common, but not universal. Most image formats allow for fewer bits per pixel, but since modern monitors are capable of displaying millions of colors and hard drives are so much bigger than even a few years ago, taking advantage of this capability is falling out of favor. Also, several image formats allow more bits per pixel, which can be important in applications like medical imaging.

Another important modification to the traditional 24-bits-per-pixel setup is that that several image formats add a fourth channel to the traditional red, green and blue color channels called the alpha channel, which measures the transparency of a given pixel. This bumps each pixel up to, typically, 32 bits or 4 bytes per pixel, which can be easier for the computer to work with, anyway. Many modern operating systems, perhaps most notably Mac OS X and the forthcoming Longhorn version of Windows, make extensive use of transparency in dropdown menus, background windows, etc. User beware, though! Sometimes what appears to be transparency is faked: an excellent (but, alas, completely overwhelmed by iTunes in recent years) audio program for the Mac called Audion completely faked the transparency of their control windows by keeping track of what was supposed to be behind the window and then, like a chameleon, altering the appearance of the window to make it look like it was transparent. For more, check out The True Story of Audion.

Why do we need compression?

In an uncompressed state, images are quite large. In the below example, the image is 232 × 309 pixels. This means the uncompressed image on the left uses 232 × 309 = 71,688 pixels at 24 bits per pixel, or 71,688 × 24 = 1,720,512 bits: 210 kB! This is a lot of hard drive space for such a small image. The image on the right is compressed using the JPEG algorithm: its size is 44 kB.

uncompressed image

By the way, I apologize to Red Sox fans for the image. It turns out this is the only bitmap image on my hard drive of any size and quality. And, for obvious reasons, bitmaps are hard to find on the web.

Numerical representation

Example image Matrix of example image

A computer stores an image as a bunch of numbers. The matrix on the right records the brightness of each pixel in the image. For a color image, there would be three such matrices, one each for red, blue and green.

Technically speaking, there’s an additional step in JPEG compression before we move to the next slide. For JPEG, we convert from RGB to a different color space called YCbCr, where Y records brightness and Cb and Cr together represent chrominance. Since the human eye is much better at discerning small differences in brightness than in color, the chrominance matrices are often (but not always) “downsampled” by assigning each 4-pixel square the average of the values of each individual pixel in that square. This immediately compresses the image by a factor of 2 without much visual impact. The following steps are applied to each of the Y, Cb and Cr matrices, but remember that the Cb and Cr matrices are smaller because of the downsampling.

Break it up into chunks

For JPEG compression, we break the image up into 8 × 8 pixel pieces; our example is already 8 × 8, so we won’t worry about this step.
For technical reasons, we subtract 128 from each number in the matrix, yielding the following:

New matrix

Now what?

Think of this matrix as describing a landscape where the numbers represent elevation.
Using this idea, we can describe any landscape in the world in terms of a few “standard” landscapes (next slide), which vary from totally flat to very bumpy.
It turns out that most landscapes that occur in actual pictures aren’t very bumpy. Also, while the human eye is fairly good at seeing small differences in brightness over a relatively large area, it’s not so good at distinguishing the exact strength of a high frequency brightness variation.
These two facts together mean that, in general, the bumpy standard landscapes don’t contribute much to how we perceive an average picture, so we can essentially throw them out without losing much.

A few of the “standard” landscapes

The standard landscapes The standard landscapes as pixels
Abstract view In pixels

The Discrete Cosine Transformation

The Discrete Cosine Transformation (DCT) measures how much of each “standard” landscape makes up a given 8 × 8 pixel block. Applying the DCT to our image (i.e., the matrix) gives a new matrix, where each entry measures how much each of the “standard” landscapes contribute to the image:

Matrix after the DCT

Notice that most of the numbers in the bottom right are quite small. These correspond to the bumpy landscapes.

For those who’ve heard of such things, the Discrete Cosine Transformation is essentially the real part of a discrete Fourier transform. What I’m calling the “standard” landscapes are nothing more than a useful orthogonal basis for the space of 64-dimensional vectors (any 8 × 8 matrix is just an ordered set of 64 numbers and so can be thought of as a 64-dimensional vector) and the Discrete Cosine Transformation is just telling me how the 64-vector given by the matrix can be written as a linear combination of these basis vectors.

Quantization

Remember we said we could kill off the high frequency (bumpy) landscapes without too much adverse impact on image quality. This reduction is accomplished by dividing each element in the matrix by some number and then rounding off. For example, the following matrix, called a quantization matrix, represents how much each entry in our matrix will be divided by:

Quantization matrix

I don’t know this for a fact, but I imagine the JPEG people arrived at these numbers by empirical testing. That is, they probably tried a whole bunch of different quantization matrices and found that this one (and others that are commonly used) did a pretty good job of maintining the visual integrity of the compressed image.

What does quantization do?

Matrix after DCT “÷” Quantization matrix

= Quantized matrix

The matrix on the left, remember, is what we got after applying the DCT; the matrix on the top right is our quantization matrix. Notice how the whole bottom-right of the result is all zeroes.

A matrix is just a string of numbers...

In computer memory, the matrix is just stored as a long string of numbers; we record them in a zig-zag fashion
For our matrix, this results in the following sequence of numbers:
-26, -3, 0, -3, -2, -6, 2, -4, 1, -4, 1, 1, 5, 1, 2, -1, 1, -1, 2, 0, 0, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
JPEG has a special Huffman code, EOB, which tells a program reading this string of bits that the rest of the matrix is all zeroes. Using this technique, the above string becomes:
-26, -3, 0, -3, -2, -6, 2, -4, 1, -4, 1, 1, 5, 1, 2, -1, 1, -1, 2, 0, 0, 0, 0, 0, -1, -1, EOB

This is already much shorter, but we can compress further using Huffman encoding. All these steps, along with some others I haven’t mentioned, allow us to compress an image to one tenth its original size with little to no visible difference.

Note: The first point above is an important one. When we talk about JPEG encoding or many other types of algorithms, we may use matrices, arrays and various other objects to metaphorically describe what’s going on, but in the computer everything is just a big string of 1’s and 0’s. Often this is an annoyance, so we create variable types like array or more complicated objects to help us deal with the string of numbers in a comprehensible way, but in this case it’s actually an advantage that, fundamentally, the computer sees our matrix as just a long line of numbers.

Decompressing

To get an image back from a JPEG file, you essentially just run all the steps in reverse. JPEG is a lossy compression format, so what you get back isn’t exactly the same as what you started with, but it should be close. In the example we’ve been using, here’s the original image and the decompressed image. They’re similar, but not identical:

Example image Decompressed example image

The difference between original and decompressed image above is especially noticable in the bottom left corner. In the original, the bottom left pixel is lighter than the pixel to its immediate right, and the same holds for the two pairs directly above. In the decompressed image, the right pixel in the bottom two pair is lighter and in the third, both pixels are the same.

Artifacts

If we try to compress an image too much, we introduce visible artifacts, as you can see in the below versions of a picture of a donkey:

Low quality donkey Medium quality donkey High quality donkey

Image sizes from left to right: 1.7 kB, 5.7 kB, 36 kB (10, 50 and 100% quality)

Note: There are also lossless compression algorithms, the most common being PNG, which stands for Portable Network Graphics (though, in typically geeky Linux fashion, originally it stood for PNG’s not GIF; PNG was created as a substitute for the patent-encumbered GIF format). The good thing about a lossless format like PNG is that compression doesn’t introduce artifacts. This is especially important for images with sharp contrast and color transitions, like line drawings or blueprints. It’s also very important for images that will be manipulated or edited. In general, it’s a good idea to edit images stored in a lossless format like PNG and only convert to a lossy format like JPEG when editing is complete. Otherwise, irreversible artifacts may corrupt the image in the process.

The downside to lossless compression is that there’s generally much less flexibility in terms of specifying how much you want to compress, which can be important (e.g., for many applications, the middle image above might be more than adequate, especially at 1/6 the size of the right image). Also, for most photographs, PNG and high-quality JPEG versions are visually indistinguishable, yet JPEG will almost always have a significantly smaller filesize. A non-compression-related advantage to PNG files is that the PNG format allows for many gradations of transparency, so that, for example, the same image can be seemlessly placed over many different background colors without modification.

Credits

Images:

Hoffman, Gernot. “JPEG compression.”
“JPEG.” Wikipedia, The Free Encyclopedia.

Presentation Format:

February 22, 2006