Waiting for PostGIS 3: Separate Raster Extension

The raster functionality in PostGIS has been part of the main extension since it was introduced. When PostGIS 3 is released, if you want raster functionality you will need to install both the core postgis extension, and also the postgis_raster extension.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;

Breaking out the raster functionality allows packagers to more easily build stripped down “just the basics” PostGIS without also building the raster dependencies, which include the somewhat heavy GDAL library.

The raster functionality remains intact however, and you can still do nifty things with it.

For example, download and import a “digital elevation model” file from your local government. In my case, a file that covers the City of Vancouver.

Vancouver DEM

# Make a new working database and enable postgis + raster
createdb yvr_raster
psql -c 'CREATE EXTENSION postgis' yvr_raster
psql -c 'CREATE EXTENSION postgis_raster' yvr_raster

# BC open data from https://pub.data.gov.bc.ca/datasets/175624/
# Download the DEM file, unzip and load into the database
wget https://pub.data.gov.bc.ca/datasets/175624/92g/092g06_e.dem.zip
unzip 092g06_e.dem.zip

# Options: create index, add filename to table, SRID is 4269, use 56x56 chip size
raster2pgsql -I -F -s 4269 -t 56x56 092g06_e.dem dem092g06e | psql yvr_raster

After the data load, we have a table of 56-by-56 pixel elevation chips named dem092g06e. If you map the extents of the chips, they look like this:

Vancouver DEM Chips

Imagine a sealevel rise of 30 meters (in an extreme case, Greenland plus Antarctica would be 68 meters). How much of Vancouver would be underwater? It’s mostly a hilly place. Let’s find out.

CREATE TABLE poly_30 AS 
  SELECT (
   ST_DumpAsPolygons(
    ST_SetBandNoDataValue(
     ST_Reclass(
      ST_Union(rast), 
      '0-30:1-1, 31-5000:0-0', '2BUI'),
     0))).*
FROM dem092g06e d

There are a lot of nested functions, so reading from the innermost, we:

  • union all the chips together into one big raster
  • reclassify all values from 0-30 to 1, and all higher values to 0
  • set the “nodata” value to 0, we don’t care about things that are above our threshold
  • create a vector polygon for each value in the raster (there only one value: “1”)

The result looks like this:

Vancouver 30m Flood Zone

We can grab building footprints for Vancouver and see how many buildings are going to be underwater.

# Vancouver open data 
# https://data.vancouver.ca/datacatalogue/buildingFootprints.htm
wget ftp://webftp.vancouver.ca/OpenData/shape/building_footprints_2009_shp.zip
unzip building_footprints_2009_shp.zip

# Options: create index, SRID is 26910, use dump format
shp2pgsql -I -s 26910 -D building_footprints_2009 buildings | psql yvr_raster

Before we can compare the buildings to the flood zone, we need to put them into the same projection as the flood zone (SRID 4269).

ALTER TABLE buildings
ALTER COLUMN geom 
TYPE geometry(MultiPolygonZ, 4269)
USING ST_Transform(geom, 4269)

Vancouver Buildings

(There are buildings on the north shore of Burrard inlet, but this data is from the City of Vancouver. Jurisdictional boundaries are the bane of geospatial analysis.)

Now we can find flooded buildings.

CREATE TABLE buildings_30_poly AS
  SELECT b.* 
    FROM buildings b
    JOIN poly_30 p
      ON ST_Intersects(p.geom, b.geom)
    WHERE ST_Intersects(p.geom, ST_Centroid(b.geom))

There’s another way to find buildings below 30 meters, without having to build a polygon: just query the raster value underneath each building, like this:

CREATE TABLE buildings_30_rast AS
  SELECT b.*
    FROM buildings b
    JOIN dem092g06e d
      ON ST_Intersects(b.geom, d.rast)
    WHERE ST_Value(d.rast, ST_Centroid(b.geom)) < 30;

Since building polygons from raster can be expensive, joining the raster and vector layers is usually the way we want to carry out this analysis.

Vancouver Buildings in 30m Flood Zone

For bringing continuous data (elevations, temperatures, model results) into GIS analysis, the postgis_raster extension can be quite useful.

Waiting for PostGIS 3: ST_AsMVT Performance

Vector tiles are the new hotness, allowing large amounts of dynamic data to be sent for rendering right on web clients and mobile devices, and making very beautiful and highly interactive maps possible.

Since the introduction of ST_AsMVT(), people have been generating their tiles directly in the database more and more, and as a result wanting tile generation to go faster and faster.

Every tile generation query has to carry out the following steps:

  • Gather all the relevant rows for the tile
  • Simplify the data appropriately to match the resolulution of the tile
  • Clip the data to the bounds of the tile
  • Encode the data into the MVT protobuf format

For PostGIS 3.0, performance of tile generation has been vastly improved.

  • First, the clipping process has been sped up and made more reliable by integrating the wagyu clipping algorithm directly into PostGIS. This has sped up clipping of polygons in particular, and reduced instances of invalid output geometries.

  • Second, the simplification and precision reduction steps have been streamlined, to avoid unnecessary copying and work on simple cases like points and short lines. This has sped up handling of simple points and lines.

  • Finally, ST_AsMVT() aggregate itself has been made parallelizeable, so that all the work above can be parcelled out to multiple CPUs, dramatically speeding up generation of tiles with lots of input geometry.

PostGIS vector tile support has gotten so good that even projects with massive tile generation requirements, like the OpenMapTiles project, have standardized their tiling on PostGIS.

Waiting for PostGIS 3: Hilbert Geometry Sorting

With the release of PostGIS 3.0, queries that ORDER BY geometry columns will return rows using a Hilbert curve ordering, and do so about twice as fast.

Whuuuut!?!

The history of “ordering by geometry” in PostGIS is mostly pretty bad. Up until version 2.4 (2017), if you did ORDER BY on a geometry column, your rows would be returned using the ordering of the minimum X coordinate value in the geometry.

One of the things users expect of “ordering” is that items that are “close” to each other in the ordered list should also be “close” to each other in the domain of the values. For example, in a set of sales orders ordered by price, rows next to each other have similar prices.

To visualize what geometry ordering looks like, I started with a field of 10,000 random points.

CREATE SEQUENCE points_seq;

CREATE TABLE points AS
SELECT 
  (ST_Dump(
    ST_GeneratePoints(
        ST_MakeEnvelope(-10000,-10000,10000,10000,3857),
        10000)
    )).geom AS geom,
  nextval('points_seq') AS pk;

Ten thousand random points

Now select from the table, and use ST_MakeLine() to join them up in the desired order. Here’s the original ordering, prior to version 2.4.

SELECT ST_MakeLine(geom ORDER BY ST_X(geom)) AS geom
FROM points;

Random Points in XMin Order

Each point in the ordering is close in the X coordinate, but since the Y coordinate can be anything, there’s not much spatial coherence to the ordered set. A better spatial ordering will keep points that are near in space also near in the set.

For 2.4 we got a little clever. Instead of sorting by the minimum X coordinate, we took the center of the geomery bounds, and created a Morton curve out of the X and Y coordinates. Morton keys just involve interleaving the bits of the values on the two axes, which is a relatively cheap operation.

The result is way more spatially coherent.

Random Points in Morton Order

For 3.0, we have replaced the Morton curve with a Hilbert curve. The Hilbert curve is slightly more expensive to calculate, but we offset that by stripping out some wasted cycles in other parts of the old implementation, and the new code is now faster, and also even more spatially coherent.

SELECT ST_MakeLine(geom ORDER BY geom) AS geom
FROM points;

Random Points in Hilbert Order

PostGIS 3.0 will be released some time this fall, hopefully before the final release of PostgreSQL 12.

Waiting for PostGIS 3: ST_AsGeoJSON(record)

With PostGIS 3.0, it is now possible to generate GeoJSON features directly without any intermediate code, using the new ST_AsGeoJSON(record) function.

The GeoJSON format is a common transport format, between servers and web clients, and even between components of processing chains. Being able to create useful GeoJSON is important for integrating different parts in a modern geoprocessing application.

PostGIS has had an ST_AsGeoJSON(geometry) for forever, but it does slightly less than most users really need: it takes in a PostGIS geometry, and outputs a GeoJSON “geometry object”.

The GeoJSON geometry object is just the shape of the feature, it doesn’t include any of the other information about the feature that might be included in the table or query. As a result, developers have spent a lot of time writing boiler-plate code to wrap the results of ST_AsGeoJSON(geometry) up with the columns of a result tuple to create GeoJSON “feature objects”.

The ST_AsGeoJSON(record) function looks at the input tuple, and takes the first column of type geometry to convert into a GeoJSON geometry. The rest of the columns are added to the GeoJSON features in the properties member.

SELECT ST_AsGeoJSON(subq.*) AS geojson 
FROM ( 
  SELECT ST_Centroid(geom), type, admin 
  FROM countries 
  WHERE name = 'Canada' 
) AS subq
{"type": "Feature", 
 "geometry": { 
    "type":"Point", 
    "coordinates":[-98.2939042718784,61.3764628013483] 
  }, 
  "properties": { 
    "type": "Sovereign country", 
    "admin": "Canada" 
  } 
}

Using GeoJSON output it’s easy to stream features directly from the database into an OpenLayers or Leaflet web map, or to consume them with ogr2ogr for conversion to other geospatial formats.

BC IT Outsourcing 2018/19

The years keep on ticking by, but the numbers never get any smaller. The contracts are written for decades at a time, and change is hard so…?

In the UK, the review of IT outsourcing contracts was called the “Ocean Liner report”, with reference to how long it takes to turn one around.

That said, the curve of BC outsourced IT spending continues to march upward.

To what do we owe the continued growth? This year, no one vendor can really claim the lion’s share of growth, and yet every vendor except Maximus notched some modest growth. I would expect the $79M Maximus revenue line to shrink pretty quickly over the next couple years, as the MSP billing system winds down.

Maximus revenue won’t go to zero though. Having built out the infrastructure for collecting MSP premiums, they took that expertise and won contracts to collect court ordered family support payments. That vendor relationship got so intertwined with the Ministry that the Auditor General looked askance at it. They also used their position in the data flow to ramp up billings by taking on more responsibilities and adding to their systems.

This is one of the reasons that billings tend to go up-and-up even though outsourcing contracts are usually justified in the name of cost control and predictability.

From a vendor point of view, the beauty of an outsourcing agreement is that it lands you into the heart of a business, providing a privileged view of the opportunity space. There is nothing a client likes more (assuming they have the budget) then a vendor that can pro-actively provide a proposal to fix an obvious business problem. The fact that the visibility into those problems is monopolized by the incumbent vendor is just an unfortunate side effect. Over time, these little proposals can grow into whole new lines of outsourced work. If there is no internal capacity to support the new programs, the vendor just increases its share of government revenues.

The UK is as big and complex as BC (more so, of course) and they have reviewed their major outsourcing contracts, and exited a few. Iain Patterson, who wrote the “Ocean Liner” report, saw incumbent vendors, and their bias towards the (lucrative) status quo as a barrier to modernization:

Unfortunately, the main barrier that prevents departments from investing in these solutions is the contract landscape. Many still have large, legacy contracts using system integrators which affect their ability to change their technology estate. They’re faced with costly change control requests and complicated workarounds to link up cloud-based commodity solutions with their existing technology.

Unfortunately, there is no one answer to how to get out of the bind, except “very slowly and carefully”. As Patterson said:

“Understand the capabilities you have,” he said. “Understand what you’ve got as far as your contractual commitments. When the contracts expire, work with the companies that hold those contracts to try and change that landscape during the lifetime of the contract.

“And if you can’t do that, then we will try to do things in a different way. But we’ll come out of those contracts understanding what it is that we have in them at that moment in time, what can we commoditise in those, and where they are best suited to be delivered from, whether that’s internal, external, or whether there are things we can buy now rather than build.”

Turn that ocean liner, a little bit at a time.