Magical PostGIS

I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I’ve learned over the past six months about using PostgreSQL and PostGIS together to make “magic” (any sufficiently advanced technology)

Making Lines from Points

Somehow I’ve gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague’s blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there’s always something new to learn.

Suppose you have a GPS location table:

  • gps_id: integer
  • geom: geometry
  • gps_time: timestamp
  • gps_track_id: integer

You can get a correct set of lines from this collection of points with just this SQL:

SELECT
  gps_track_id,
  ST_MakeLine(geom ORDER BY gps_time ASC) AS geom
FROM gps_poinst
GROUP BY gps_track_id

Those of you who already knew about placing ORDER BY within an aggregate function are going “duh”, and the rest of you are, like me, going “whaaaaaa?”

Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.

Deloitte's Second Act

Hot off their success transforming the BC social services sector with “integrated case management”, Deloitte is now heavily staffing the upcoming transformation of the IT systems that underpin our natural resource management ministries.

Interlude: I should briefly note here that Deloitte’s work in social services involved building a $180,000,000 case management system that the people who use it generally do not like, using software that nobody else uses for social services, that went offline for several consecutive days last year, and based on software that basically entered end-of-life almost five years ago. I’m sure that’s not Deloitte’s fault, they are only the international experts hired to advise on the best ways to build the system and then actually build it.


*So many shiny arrows!
Smells like management consultants...*

Anyhow…

The brain trust has now decided that the thing we need on the land base is “integrated decision making”, presumably because everything tastes better “integrated”. A UVic MPA student has done a complete write-up of the scheme — and I challenge you to find the hard centre inside this chewey mess of an idea — but here’s a representative sample:

The IDM initiative is an example of horizontal management because it is an initiative among non‐hierarchical ministries focused on gaining efficiencies by harmonizing regulations, IT systems and business processes for the betterment of the NRS as a whole. Horizontal management is premised on joint or consensual decision making rather than a more traditional vertical hierarchy.  Horizontal collaborations create links and share information, goodwill, resources, and power or capabilities by organizations in two or more sectors to achieve jointly what they cannot achieve individually.  

Sounds great, right!?! Just the sort of thing I’d choose to manage billions of dollars in natural resources! (I jest.)

Of course, the brain trust really isn’t all that interested in “horizontal management”, what has them hot and bothered about “integrated decision making” is that it’s an opportunity to spend money on “IT systems and business processes”. Yay!

To that end, they carefully prepared a business case for Treasury Board, asking for well north of $100M to rewrite every land management system in government. Forests, lands, oil and gas, heritage, the whole kit and caboodle. The business case says:

IDM will improve the ability of the six ministries and many agencies in the NRS to work together to provide seamless, high‐quality service to proponents and the public, to provide effective resource stewardship across the province, to effectively consult with First Nations in natural resource decisions, and to contribute to cross‐government priorities.

Sounds ambitious! I wonder how they’re going to accomplish this feat of re-engineering? Well, I’m going to keep on wondering, because they redacted everything in the business case except the glowing hyperbole.

However, even though we don’t know how, or really why, they are embarking on this grand adventure, we can rest assured that they are now spending money at a rate of about $10M / year making it happen, much of it on our good friends Deloitte.


*Not that Secretariat...*

  • There are currently 80 consultants billing on what has been christened the “Natural Resource Sector Transformation Secretariat”.
  • Of those consultants 34 are (so far) from Deloitte.
  • Coincidentally, 34 is also the number of government staff working at the Secretariat.
  • So, 114 staff, of which 34 are government employees and the rest are contractors. How many government employees does it take to change a lightbulb? Let me take that to procurement and I’ll get back to you.

The FOI system charged me $120 (and only after I bargained down my request to a much less informative one) to find the above out, because they felt that the information did not meet the test of being “of public interest”. If you feel it actually is in the public interest to learn where our $100M on IT services for natural resources are being spent, and you live in BC, please leave me a comment on this post.

Interlude: The test for whether fees should be waived is double barrelled, but is (hilariously) decided by the public body itself (soooo unbiased). Here are the tests I think I pass (but they don’t):

  1. Do the records show how the public body is allocating financial or other resources?
  2. Is your primary purpose to disseminate information in a way that could reasonably be expected to benefit the public, or to serve a private interest?

I’m still digging for more information (like, how is it that Deloitte can bill out 34 staff on this project when there hasn’t been a major RFP for it yet?) so stay tuned and send me any hints if you have them.

GeoTiff Compression for Dummies

Updated: August 2022

“What’s the best image format for map serving?” they ask me, shortly after I tell them not to serve their images from inside a database.

“Is it MrSid? Or ECW? those are nice and small.” Which indeed they are. Unfortunately, outside of proprietary image server software I’ve never seen them be fast and nice and small at the same time. Generally the decode step is incredibly CPU intensive, presumably because of the fancy wavelet math that makes them so small in the first place.

“So, what’s the best image format for map serving?”.

In my experience, the best format for image serving, using open source rendering engines (MapServer, GeoServer, Mapnik) is: GeoTIFF, with JPEG compression, internally tiled, in the YCBCR color space, with internal overviews. Unfortunately, GeoTiffs are almost never delivered this way, as I was reminded today while downloading a sample image from the City of Kamloops open data site (But nonetheless, thanks for the great free imagery, Kamloops!)

Ortho_5255C.zip [632MB]

It came in a 632Mb ZIP file. “Hm, that’s pretty big, I thought.” I unzipped it.

5255C.tif [687M]

Unzipped it is a 687M TIF file. What can we learn about it?

This post uses the GDAL command line utilities and the GeoTIFF driver in particular to write out new files using different compression options.

gdalinfo 5255C.tif

From top to bottom the metadata tells us the image is:

  • 15000 by 12000 pixels
  • In the UTM zone 10N coordinate system
  • Pixel size of 0.1 (since UTM is in meters, that means 10cm pixels, nice data!)
  • Pixel interleaved
  • Uncompressed! (ordinarily the compression type is in the “Image Structure Metadata” but there’s no compression indicated there)
  • At a lower left corner at 120°19’52.40”W 50°40’6.59”N
  • Made up of four 8-bit (range 0-255) bands: Red, Green, Blue and Alpha

Can we make it smaller? Sure. The default TIFF compression is, frequently, “deflate”, the same as that used for ZIP. This is a lossless encoding, but not very good at compressing imagery.

gdal_translate \
  -co COMPRESS=DEFLATE \
  5255C.tif 5255C_DEFLATE.tif

The result is about 30% smaller than the uncompressed file, but it is still (as we shall see) much larger than it needs to be.

5255C_DEFLATE.tif [488M]

We can make the image a whole lot smaller just by using a more appropriate image compression algorithm, like JPEG. Note that this is a safe choice for visual imagery. Not for rasters like DEM data or measurements where you want to retain the exact values on every pixel.

We’ll also tile the image internally while we’re at it. Internal tiling allows renderers to quickly pick out and decompress just a small portion of the image, which is important once you’ve applied a more computationally expensive compression algorithm like JPEG.

gdal_translate \
  -co COMPRESS=JPEG \
  -co TILED=YES \
  5255C.tif 5255C_JPEG.tif

This is much better, now we have a vastly smaller file.

5255C_JPEG.tif [67M]

But we can still do better! For reasons that well pass my understanding, the JPEG algorithm is more effective against images that are stored in the YCBCR color space. Mine is not to reason why, though.

gdal_translate \
  -co COMPRESS=JPEG \
  -co PHOTOMETRIC=YCBCR \
  -co TILED=YES \
  5255C.tif 5255C_JPEG_YCBCR.tif

Did you get an error here? I did! My input file from Kamloops has four bands (RGBA) but my YCBCR photometric space can only handle three bands. So I need to only read the RGB bands from my file. They are usually (and the file metadata confirms this) in bands 1, 2 and 3. So my conversion command looks like this:

gdal_translate \
  -b 1 -b 2 -b 3 \
  -co COMPRESS=JPEG \
  -co PHOTOMETRIC=YCBCR \
  -co TILED=YES \
  5255C.tif 5255C_JPEG_YCBCR.tif

Wow, now we’re down to 1/30 the size of the original.

5255C_JPEG_YCBCR.tif [28M]

But, we’ve applied a “lossy” algorithm, JPEG, maybe we’ve ruined the data! Let’s have a look.

Original After JPEG/YCBCR

Can you see the difference? Me neither. Using the default JPEG “quality” level of 75%, there are no visible artefacts. In general, JPEG is very good at compressing things so humans “can’t see” the lost information. I’d never use it for compressing a DEM or a data raster, but for a visual image, I use JPEG with impunity, and with much lower quality settings too (for more space saved).

What about a quality level of 50%?

gdal_translate \
  -b 1 -b 2 -b 3 \
  -co COMPRESS=JPEG \
  -co JPEG_QUALITY=50 \
  -co PHOTOMETRIC=YCBCR \
  -co TILED=YES \
  5255C.tif 5255C_JPEG_YCBCR_50.tif

See the quality difference? Possibly in the gravel road at the top, but it’s hard to say. In general on larger areas with very subtley changing colors you might get some visible compression artefacts.

Original JPEG/YCBCR @ 50%

And we dropped another 30% in file size! We are now less than 3% of the original file size.

5255C_JPEG_YCBCR_50.tif [18M]

Finally, for high speed serving at more zoomed out scales, we need to add overviews to the image. We’ll make sure the overviews use the same, high compression options as the base data.

gdaladdo \
  --config COMPRESS_OVERVIEW JPEG \
  --config JPEG_QUALITY_OVERVIEW 50 \
  --config PHOTOMETRIC_OVERVIEW YCBCR \
  --config INTERLEAVE_OVERVIEW PIXEL \
  -r average \
  5255C_JPEG_YCBCR_50.tif \
  2 4 8 16

For reasons passing understanding, gdaladdo uses a different set of command-line switches to pass the configuration info to the compressor than gdal_translate does, but as before, mine is not to reason why.

The final size, now with overviews as well as the original data, is still less that 1/25 the size of the original.

5255C_JPEG_YCBCR_50.tif [26M]

So, to sum up, your best format for image serving is:

  • GeoTiff, so you can avoid proprietary image formats and nonsense, with
  • JPEG compression, for visually fine results with much space savings, and
  • (optionally) a more agressive JPEG quality setting, and
  • YCBCR color, for even smaller size, and
  • internal tiling, for fast access of random squares of data, and
  • overviews, for fast access of zoomed out views of the data.

Go forth and compress!

What about COG?

Wait, I haven’t mentioned the new kid in imagery serving, the “cloud optimized GeoTIFF”! Maybe that is the answer now and all this stuff is moot?

As always in software, the answer is “it depends”. If you are storing your images online, in a cloud “object store” like AWS S3, or Azure Blob Storage, or Google Cloud Storage, or Cloudfare R2, then using COG as your format makes a lot of sense. The number of native COG clients is growing and COG clients can access the files directly in the cloud storage without downloading, which can be immensely useful.

To generate COG files with GDAL, use the new COG driver instead of the GeoTIFF driver. An equivalent GDAL COG-generating command to the final conversion in this post looks like this:

gdal_translate \
  -of COG \
  -co COMPRESS=JPEG \
  -co QUALITY=50 \
  5255C.tif 5255C_COG.tif

All the fancy stuff from above about tiling and overviews and YCBCR photometric spaces is automatically included in the GDAL COG output driver, so the command is a lot simpler.

5255C_COG.tif [25M]

And the output is still very efficiently compressed!

The COG file has all the same features as the basic GeoTIFF (efficient JPEG, tiling, overviews) and it has some extra metadata to make it easier and faster for native COG clients to read data from it, so there’s no reason not to always use the COG option, if your version of GDAL supports it.

Breaking a Linestring into Segments

Like doing a sudoku, solving a “simple yet tricky” problem in spatial SQL can grab ones mind and hold it for a period. Someone on the PostGIS IRC channel was trying to “convert a linestring into a set of two-point segments”, using an external C++ program, and I thought: “hm, I’m sure that’s doable in SQL”.

And sure enough, it is, though the syntax for referencing out the parts of the dump objects makes it look a little ugly.