Cloudinary Blog

Progressive JPEGs and green Martians

3 Ways to Do Progressive JPEG Encoding

There are two different kinds of JPEG images: progressive JPEGs and non-progressive JPEGs. These categories have nothing to do with the JPEGs’ political beliefs. They’re all about the order in which they’ve been encoded.

Non-progressive JPEGs are encoded (and thus also decoded) in a very simple order: from top to bottom (and left to right). This means that when a non-progressive JPEG is loading on a slow connection, you first get to see the top part of the image. More and more of the image is revealed as loading progresses.

Progressive JPEGs are encoded in a different way. When you see a progressive JPEG loading, you’ll see a blurry version of the full image, which gradually gets sharper as the bytes arrive.

Here is the same image, encoded as a non-progressive JPEG (on the left) and as a progressive JPEG (on the right), decoded in slow motion:

So progressive and non-progressive JPEGs are two ways, built into the JPEG format, to optimize images and make them display faster for users on slow connections.

What’s the magic behind progressive JPEGs?

JPEG first converts RGB pixels to YCbCr pixels. Instead of having Red, Green and Blue channels, JPEG uses a luma (Y) channel and two chroma channels (Cb and Cr). Those channels are treated separately, because the human eye is more sensitive to distortion in luma (brightness) than it is to distortion in chroma (color). The chroma channels are optionally downsampled to half the original resolution; this is called chroma subsampling.

Then, JPEG does a bit of mathematical magic with the pixels. This magic is called the Discrete Cosine Transform (DCT). What this does is the following: every block of 8x8 pixels (64 pixel values) is converted to 64 coefficients that represent the block’s information in a different way. The first coefficient is called the DC coefficient and it boils down to the average pixel value of all the pixels in the block. The other 63 coefficients (the so-called AC coefficients) represent horizontal and vertical details within the block; they are ordered from low frequency (overall gradients) to high frequency (sharp details).

The goal of these transformations is to do lossy image compression. For our perception, luma and low-frequency signals are more important than chroma and high-frequency signals. JPEG cleverly encodes with less precision what we can’t see well anyway, resulting in smaller files. But as a side effect, a kind of extra ‘bonus’, this transformation also makes it possible to encode and decode JPEGs progressively.

Instead of going through the image block by block, encoding all of the coefficients of each block (which is what a non-progressive JPEG does), you can encode all of the DC coefficients first, some low-frequency AC coefficients after that, and the high-frequency AC coefficients at the very end. This is called spectral selection. Additionally, you can do successive approximation, storing the most significant bits of the coefficients first, and the least significant bits later in the bitstream.

For both spectral selection and successive approximation, the encoder and decoder have to traverse the image multiple times. Each iteration is called a scan. Typically, a progressive JPEG has about 10 scans, so when decoding, the image goes from a very blurry first scan to a nice and sharp final scan in about 10 steps of refinement.

Advantages and disadvantages

One obvious advantage of progressive JPEG encoding is that you get full-image previews while downloading the image on a slow connection. You can see what’s in the picture even when only a fraction of the file has been transferred, and decide whether you want to wait for it to fully load or not. On the other hand, some people consider progressive loading behavior to be something of a disadvantage, since it becomes hard to tell when an image has actually finished loading. You might even get a bad impression from a website because “the photos look blurry” (while in fact the site was still loading and you only saw a progressive preview of the photos). We will come back to this point later.

A less obvious advantage of progressive JPEGs is that they tend to be smaller (in terms of filesize) than non-progressive JPEGs, even though the (final) image is exactly the same. Because similar DCT coefficients across multiple blocks end up being encoded together, they tend to compress somewhat better than non-progressive JPEGs, whose blocks are encoded one-at-a-time. The extra compression is not huge – a few percentage points, typically – but still, it saves some bandwidth and storage, without any effect on the image quality.

Progressive JPEG encoding also has some downsides, though. First of all: they’re not always smaller. For very small images (thumbnails, say), progressive JPEGs are often a bit larger than non-progressive JPEGs. However, for such small image files, progressive rendering is not really useful anyway.

Another disadvantage of progressive JPEGs is that it takes more CPU time and memory to encode and decode them. It takes more time because the algorithm has to go over the image data multiple times, instead of doing everything in one single scan. It takes more memory because all of the DCT coefficients have to be stored in memory during decoding; in non-progressive decoding, you only need to store one block of coefficients at a time.

Decoding a typical progressive JPEG image takes about 2.5 times as much time as decoding a non-progressive JPEG. So while it does give you a preview faster, the overall CPU time is significantly longer. This does not really matter on desktop or laptop computers – JPEG decoding is pretty fast, progressive or not, and memory and processing power are usually abundant. But on low-power devices like smartphones, it does have a slight impact on battery life and image loading time.

Encoding a progressive JPEG also takes more time. It’s about 6 to 8 times slower, and harder to do in hardware. For this reason, cameras (even high-end ones) typically produce non-progressive JPEGs.

How to get the best of both worlds

Until now, I have talked about “progressive JPEGs” as if there is only one way to do progressive JPEG encoding. As if it’s a binary choice: either progressive, or non-progressive. But that’s not the case. It’s actually possible to do something “in between”.

Progressive JPEGs use a so-called scan script that defines which part of the image data is encoded in each of the scans. Most JPEG encoders use a default scan script which defines 10 scans. Advanced JPEG encoders like MozJPEG try different scan scripts and pick the one which results in best compression; they might end up with fewer or more scans, depending on the image.

The advantages of progressive and non-progressive encoding can be combined by using a customized scan script. After some experimentation (and inspired by a talk by Tobias Baldauf), Cloudinary came up with the following scan script, which we use to encode progressive JPEGs:

0 1 2: 0 0 0 0;
0: 1 9 0 0;
2: 1 63 0 0 ;
1: 1 63 0 0 ;
0: 10 63 0 0;

Every line in the script corresponds to one scan. This is a relatively simple script, with only five scans. The first scan contains the DC coefficients of all three channels (0=Y, 1=Cb and 2=Cr). The second scan encodes the first 9 AC coefficients of the luma channel. The third and fourth scan have all of the AC coefficients of the chroma channels (Cr is done first because it tends to be visually somewhat more important). And the final, fifth scan contains the remaining 54 AC coefficients of the luma channel.

This scan script only uses spectral selection; it does not use successive approximation. There’s a reason for this: successive approximation has a larger negative impact on decode speed (because the same coefficient is revisited multiple times) than spectral selection. It has only five scans for the same reason: decode time depends mostly on the number of scans.

Using this scan script, we effectively do something “in between” non-progressive and the default progressive encoding. Let’s call it “semi-progressive”. We hit a quite good trade-off:

  • Decode time: almost as fast as non-progressive. On my laptop, a non-progressive JPEG decodes at about 215 megapixels per second; a default progressive JPEG decodes at about 110 MP/s, and a semi-progressive JPEG decodes at about 185 MP/s.
  • Compression: usually between non-progressive and default progressive. Testing on a few corpuses of images, default progressive was on average 4.5% smaller than non-progressive, while semi-progressive was 3.2% smaller. But it depends on the image: sometimes semi-progressive is even smaller than default progressive, sometimes it’s closer to the non-progressive filesize.
  • Progressive rendering: almost the same as default progressive. The only difference is that there are fewer steps of refinement, but that’s not necessarily a bad thing, as we will discuss in a minute.

Here is a comparison of an image encoded with MozJPEG as a non-progressive JPEG (on the left), a default progressive JPEG (in the center), and a semi-progressive JPEG (on the right) :

In this example, the non-progressive JPEG is 283 KB, the default progressive JPEG is 271 KB, and the semi-progressive JPEG is 280 KB. You can see that the default progressive JPEG has more steps of refinement, so it gets to a high-quality preview faster than the semi-progressive JPEG. However, the gain in compression comes at a price.

Firstly, the default progressive JPEG takes a bit longer to decode. On my laptop, the non-progressive JPEG decodes in 8.2ms, the semi-progressive JPEG in 11.9ms, and the default progressive in 18.4ms. This is obviously not going to be the main thing slowing down your website loading, but it does contribute a little. Default progressive also takes longer to encode, but that’s not typically an issue (unless you’re generating images on demand and you want to avoid latency).

A potentially bigger problem is the following. The first few progressive scans look quite weird in the default progressive JPEG (at least on this image, with MozJPEG):

Yikes! Why do we get a green Martian first, which then turns out to be a human?

It turns out that in this case, MozJPEG decides it’s a good idea (for compression) to split the DC coefficients of the three channels into three separate scans. The ‘Martian’ is what you get if only one of the two chroma channels is available.

From a psycho-visual point of view, it’s probably just as unsettling to have images with a “Flash Of Strange Colors” as it is to have a Flash Of Unstyled Text. So in this respect, the simpler semi-progressive scan script might actually be better.

Another scan script

Both the default progressive and the semi-progressive scan script still have the “problem” that it can be hard to tell when exactly the image is actually completely loaded. Whether or not this is a problem is debatable – after all, it means the progressive mechanism is doing its job, which is giving you a high-quality preview fast. At Cloudinary we like to give our users options, so let’s assume that it is indeed a problem.

Some websites improve the user experience by first loading very small, very low-quality placeholder images, which then get replaced by the actual image. In this way, by using two images, they essentially implement an explicit two-step progressive rendering approach. One of the advantages of this method is that it is very easy to tell when the image is actually loaded, since the gap between the placeholder image and the actual image is usually quite large.

But wait. Can’t we implement a similar two-step progressive rendering using just one file, by crafting a suitable progressive scan script?

Unfortunately, within the limitations of the JPEG standard it is not possible to make a progressive scan script that consists of just two scans, at least not for color images. But it is possible to do something that has the same flavor: a quick low-quality preview, followed by a steep transition to the full-quality image. Let’s call it “steep-progressive”. We use the following scan script for that:

0 1 2: 0 0 0 2;
0 1 2: 0 0 2 1;
0 1 2: 0 0 1 0;
1: 1 63 0 0 ;
2: 1 63 0 0 ;
0: 1 63 0 0;

The first scan does the DC coefficients of all three channels, except for the two least significant bits. This gives you a rough preview very quickly. The next two scans encode those two missing bits, which doesn’t result in much visual improvement. Then there are two scans that encode all of the remaining chroma information. Again, this will not result in much visual improvement, because we’re still stuck with the blocky luma. And then the final scan, which is usually the bulk of the data, encodes all remaining luma information. When this scan starts loading, the rough preview gets replaced by the final image, from top to bottom, much like a non-progressive image.

In the video below you can see a slow-motion comparison of the different scan scripts. All of these JPEGs encode exactly the same image, just in a different order. From left to right in the video: non-progressive, steep-progressive, semi-progressive, default progressive.

Want to give it a try?

OK, so how do you make these semi-progressive or steep-progressive JPEGs? Most image programs don’t offer it as a choice, but if you’re not afraid of the command line, you can simply copy/paste either of the above scan scripts into a text file, and then use the libjpeg or mozjpeg encoder with the following parameter:

cjpeg -scans scanscript.txt < input.ppm > output.jpg

If you’re already using Cloudinary, then perhaps you’re already serving semi-progressive JPEGs without even realizing it. If you use q_auto, then you’ll automatically get semi-progressive JPEGs (unless the image is very small; in that case, non-progressive is a better choice). This is just one of the things q_auto does; it does much more than that, like detecting if chroma subsampling should be enabled or not, figuring out which image format to use (if combined with f_auto), and of course adjusting the quality parameters to find the perfect balance between avoiding artifacts and reducing the filesize.

By default, Cloudinary encodes non-progressive JPEGs (except if you use q_auto, as said above). If you want to get a (default) progressive JPEG instead, you can use the flag fl_progressive; if you want to use the semi-progressive scan script, you can use fl_progressive:semi, and if you want to use the steep-progressive scan script, there is fl_progressive:steep. Finally, if you want to force q_auto to produce non-progressive JPEGs, you can use fl_progressive:none.

Here is an overview table that summarizes the pros and cons of the different progressive scan scripts we have discussed in this blogpost:

 
 
 

Non-progressive

Steep-progressive

Semi-progressive

Default progressive

Cloudinary flag:

fl_progressive:none

(default)

fl_progressive:steep

fl_progressive:semi

(q_auto default)

fl_progressive

Progressive rendering

★★

★★★

Easy to tell when done loading

★★★

★★

Smaller files

(on average)

★★

★★★

Decode speed

(and encode speed)

★★★

★★

 
 

Conclusion

The technicalities of image formats can be tricky to master, and there’s still much left to be discovered even in ‘old’ formats like JPEG images. While there’s only one way to decode an image, there are always many, many ways to encode it. Custom progressive scan scripts are just one of the many ways to tweak JPEG encoding and fine-tune image loading behavior.

At Cloudinary we realize that image optimization is key to online user experience, as the web grows more visual and most of the downloaded content on a page is images and video. We continuously tweak our image encoding (and processing) algorithms, in an ongoing effort to get the most out of every image format and give end-users the best possible experience. At the same time, we try to make life as easy as possible for developers. All you have to do is add q_auto,f_auto to your image URLs, and you’ll automatically benefit from best practices and new image formats, now and in the future.

Recent Blog Posts

Give your mobile app a boost: pre-upload image processing

As a mobile developer, enabling users to upload images and share them with other users is a very common requirement. When developing those capabilities, we need to take into account that most users won't think twice about uploading the massive images that their high-resolution mobile cameras capture. Those huge files are not only overkill for on-screen display, but can also cause significant slow downs in upload and delivery times. And of course those same users wouldn't think twice about complaining or abandoning our app if their overall user experience wasn't smooth and fast.

Read more
Cloudinary Helps Hinge Keep Modern Romance Real

To create a profile, Hinge users initially had to connect their Facebook and/or Instagram accounts to the app, which would import images to the users’ Hinge profiles. Hinge stored those images with a URL that expired after two months, unless the user logged into the app regularly. This aspect of the app was frustrating for users because the photos would become inaccessible for others to view.

Read more
Building a Smart AI Image Search Tool Using React

In our first article, we built a part of the front-end of our image search tool with the focus mainly on the parent App.js stateful component.

In this article - part two of a series - we will continue developing a Smart Search App, in which users can search for content in an image, not just the description. The app is built with React for UI interaction, Cloudinary for image upload and management and Algolia for search.

Read more