GeoTiff Compression for Dummies

“What’s the best image format for map serving?” they ask me, shortly after I tell them not to serve their images from inside a database.

“Is it MrSid? Or ECW? those are nice and small.” Which indeed they are. Unfortunately, outside of proprietary image server software I’ve never seen them be fast and nice and small at the same time. Generally the decode step is incredibly CPU intensive, presumably because of the fancy wavelet math that makes them so small in the first place.

“So, what’s the best image format for map serving?”.

In my experience, the best format for image serving, using open source rendering engines (MapServer, GeoServer, Mapnik) is: GeoTIFF, with JPEG compression, internally tiled, in the YCBCR color space, with internal overviews. Unfortunately, GeoTiffs are almost never delivered this way, as I was reminded today while downloading a sample image from the City of Kamloops (But nonetheless, thanks for the great free imagery, Kamloops!)

5255C.zip [593M]

It came in a 593Mb ZIP file. “Hm, that’s pretty big, I thought.” I unzipped it.

5255C.tif [515M]

Unzipped it was a 515Mb TIF file. That’s right, it was smaller “uncompressed”. Why? Because internally it was already compressed, and applying the ZIP compression algorithm to already compressed data generally fluffs it up a little. Whoops.

The default TIFF compression is, unfortunately, “deflate”, the same as that used for ZIP. This is a lossless encoding, but not very good for imagery. We can make the image a whole lot smaller just by using a more appropriate compression, like JPEG. We’ll also tile it internally while we’re at it. Internal tiling allows renderers to quickly pick out and decompress just a small portion of the image, which is important once you’ve applied a more serious compression algorithm like JPEG.

gdal_translate \
  -co COMPRESS=JPEG \
  -co TILED=YES \
  5255C.tif 5255C_JPEG.tif

This is much better, now we have a vastly smaller file.

5255C_JPEG.tif [67M]

But we can still do better! For reasons that well pass my understanding, the JPEG algorithm is more effective against images that are stored in the YCBCR color space. Mine is not to reason why, though.

gdal_translate \
  -co COMPRESS=JPEG \
  -co PHOTOMETRIC=YCBCR \
  -co TILED=YES \
  5255C.tif 5255C_JPEG_YCBCR.tif

Wow, now we’re down to 1/20 the size of the original.

5255C_JPEG_YCBCR.tif [24M]

But, we’ve applied a “lossy” algorithm, JPEG, maybe we’ve ruined the data! Let’s have a look.

OriginalAfter JPEG/YCBCR

Can you see the difference? Me neither. Using a JPEG “quality” level of 75%, there are no visible artefacts. In general, JPEG is very good at compressing things so humans “can’t see” the lost information. I’d never use it for compressing a DEM or a data raster, but for a visual image, I use JPEG with impunity, and with much lower quality settings too (for more space saved).

Finally, for high speed serving at more zoomed out scales, we need to add overviews to the image. We’ll make sure the overviews use the same, high compression options as the base data.

gdaladdo \
  --config COMPRESS_OVERVIEW JPEG \
  --config PHOTOMETRIC_OVERVIEW YCBCR \
  --config INTERLEAVE_OVERVIEW PIXEL \
  -r average \
  5255C_JPEG_YCBCR.tif \
  2 4 8 16

For reasons passing understanding, gdaladdo uses a different set of command-line switches to pass the configuration info to the compressor than gdal_translate does, but as before, mine is not to reason why.

The final size, now with overviews as well as the original data, is still less that 1/10 the size of the original.

5255C_JPEG_YCBCR.tif [37M]

So, to sum up, your best format for image serving is:

Go forth and compress!