Simple SQL GIS

And, late on a Friday afternoon, the plaintive cry was heard!

And indeed, into the sea they do go!

And ‘lo, the SQL faeries were curious, and gave it a shot!

##### Commandline OSX/Linux #####

# Get the Shape files

# Exe? No prob, it's actually a self-extracting ZIP
unzip ED_Province

# Get a PostGIS database ready for the data
createdb ed_clip
psql -c "create extension postgis" -d ed_clip

# Load into PostGIS
# The .prj says it is "Canada Albers Equal Area", but they
# lie! It's actually BC Albers, EPSG:3005
shp2pgsql -s 3005 -i -I ED_Province ed | psql -d ed_clip

# We need some ocean! Use Natural Earth...

# Load the Ocean into PostGIS!
shp2pgsql -s 4326 -i -I ne_10m_ocean ocean | psql -d ed_clip

# OK, now we connect to PostGIS and start working in SQL
psql -e ed_clip
-- How big is the Ocean table?
SELECT Count(*) FROM ocean;

-- Oh, only 1 polygon. Well, that makes it easy... 
-- For each electoral district, we want to difference away the ocean.
-- The ocean is a one big polygon, this will take a while (if we
-- were being more subtle, we'd first clip the ocean down to 
-- a reasonable area around BC.)
CREATE TABLE ed_clipped AS
  WHEN ST_Intersects(o.geom, ST_Transform(e.geom,4326))
  THEN ST_Difference(ST_Transform(e.geom,4326), o.geom)
  ELSE ST_Transform(e.geom,4326)
  END AS geom,
FROM ed e, ocean o;

-- Check our geometry types...
SELECT DISTINCT ST_GeometryType(geom) FROM ed_clipped;

-- Oh, they are heterogeneous. Let's force them all multi
UPDATE ed_clipped SET geom = ST_Multi(geom);
# Dump the result out of the database back into shapes
pgsql2shp -f ed2009_ocean ed_clip ed_clipped
zip ed2009_ocean.*
mv ~/Dropbox/Public/

No more districts in oceans!

And the faeries were happy, and uploaded their polygons!

Update: And the lamentations ended, and the faeries also rejoiced.

PostGIS Overlays

One question that comes up often during our PostGIS training is “how do I do an overlay?” The terminology can vary: sometimes they call the operation a “union” sometimes an “intersect”. What they mean is, “can you turn a collection of overlapping polygons into a collection of non-overlapping polygons that retain information about the overlapping polygons that formed them?”

So an overlapping set of three circles becomes a non-overlapping set of 7 polygons.

Calculating the overlapping parts of a pair of shapes is easy, using the ST_Intersection() function in PostGIS, but that only works for pairs, and doesn’t capture the areas that have no overlaps at all.

How can we handle multiple overlaps and get out a polygon set that covers 100% of the area of the input sets? By taking the polygon geometry apart into lines, and then building new polygons back up.

Let’s construct a synthetic example: first, generate a collection of random points, using a Gaussian distribution, so there’s more overlap in the middle. The crazy math in the SQL below just converts the uniform random numbers from the random() function into normally distributed numbers.

  WITH rands AS (
  SELECT generate_series as id, 
         random() AS u1, 
         random() AS u2 
  FROM generate_series(1,100)
    50 * sqrt(-2 * ln(u1)) * cos(2*pi()*u2),
    50 * sqrt(-2 * ln(u1)) * sin(2*pi()*u2)), 4326) AS geom
FROM rands;

The result looks like this:

Now, we turn the points into circles, big enough to have overlaps.

  SELECT id, ST_Buffer(geom, 10) AS geom 
    FROM pts;

Which looks like this:

Now it’s time to take the polygons apart. In this case we’ll take the exterior ring of the circles, using ST_ExteriorRing(). If we were dealing with complex polygons with holes, we’d have to use ST_DumpRings(). Once we have the rings, we want to make sure that everywhere rings cross the lines are broken, so that no lines cross, they only touch at their end points. We do that with the ST_Union() function.

CREATE TABLE boundaries AS
  SELECT ST_Union(ST_ExteriorRing(geom)) AS geom
    FROM circles;

What comes out is just lines, but with end points at every crossing.

Now that we have noded lines, we can collect them into a multi-linestring and feed them to ST_Polygonize() to generate polygons. The polygons come out as one big multi-polygon, so we’ll use ST_Dump() to convert it into a table with one row per polygon.

  SELECT nextval('polyseq') AS id, 
         (ST_Dump(ST_Polygonize(geom))).geom AS geom
  FROM boundaries;

Now we have a set of polygons with no overlaps, only one polygon per area.

So, how do we figure out how many overlaps contributed to each incoming polygon? We can join the centroids of the new small polygons with the set of original circles, and calculate how many circles contain each centroid point.

A spatial join will allow us to calculate the number of overlaps.

UPDATE POLYS set count = p.count
  SELECT count(*) AS count, AS id  
  FROM polys p 
  JOIN circles c 
  ON ST_Contains(c.geom, ST_PointOnSurface(p.geom)) 
) AS p

That’s it! Now we have a single coverage of the area, where each polygon knows how much overlap contributed to it. Ironically, when visualized using the coverage count as a variable in the color ramp, it looks a lot like the original image, which was created with a simple transparency effect. However, the point here is that we’ve created new data, in the count attribute of the new polygon layer.

The same decompose-and-rebuild-and-join-centroids trick can be used to overlay all kinds of features, and to carry over attributes from the original input data, achieving the classic “GIS overlay” workflow. Happy geometry mashing!

How I Make Jeff Richer

“How can I get paid to work on this cool open source project?”

Once upon a time, it felt like most discussions about open source were predicated on answering this question. Developers fell in love with, or created, some project or other, but found themselves working on it in their spare time. If only there was some way to monetize their labour of love!


From that complaint, a score of (never quite satisfactory) models were spawned.

  • Pure consulting, which depends on a never-ending supply of enhancements and updates, usually to the detriment of core maintainance.
  • Professional support, which depends on a deep enough market of potential enterprises, and a sense of “deployment risk” large enough to open wallets.
  • Open core, which invests in the open source core to promote adoption and hopefully build a sub-population of enterprises that have embedded the project sufficiently to be in the market for add-ons to make things faster/simpler/more integrated.
  • Relicensing, which leverages adoption of the open source to squeeze conventional licensing revenue out of enterprises that don’t want to accept open source license terms of use.

What all the models above have in common is that they more-or-less require successful adopters to invest effort in the open source project at the center of the model. Adoption of the open source project is an “on ramp” to revenue opportunities, but the operating assumption is that customer will continue using the open source core, so the business has an incentive to invest in the “on ramp”.


In the same way, Oracle provides an “on ramp” to their enterprise product, in the free-as-in-beer Oracle Express. Oracle pays for the development of the on-ramp (it’s just Oracle, after all) and in return (maybe) reaps the reward of eventual migration of users to their paid product.

Which brings me to “AWS Aurora”, now generally available on Amazon’s cloud.

AWS deliberately does not say Aurora is “PostgreSQL”. They say it is “PostgreSQL compatible”. That’s probably for the best: Aurora is a soft fork of the core PostgreSQL code that replaces big chunks of PostgreSQL storage logic with clever, custom AWS code.

Like AWS PostgreSQL RDS, Aurora is a revenue generating fork of PostgreSQL that uses open source PostgreSQL adoption as an on-ramp for AWS revenue. This would superficially seem to be a similar situation to all the other open source business models, except for one thing: AWS doesn’t have any stake in the success of PostgreSQL per se.

AWS offers RDS versions of all the RDBMS systems, open source and otherwise, and they invest in the core projects accordingly, and fairly: hardly at all. After all, to do otherwise would be to declare a preference.

So the open source communities end up building the on-ramp to AWS paid services as a free gift to AWS.

That’s annoying enough, but it gets uglier, I think, as time goes on.


For now, Aurora tracks the PostgreSQL version fairly well. You can move your app onto Aurora, you can move it back off to RDS, you can move it to Google’s managed PostgresSQL, you can host it yourself on premise or in the cloud.

However, eventually the business expense of maintaining the Aurora code base against a PostgreSQL baseline that is in constant motion will wear on Amazon, and they will start to see places where adding “Aurora only” features will “improve the customer experience”.

At that point, the soft fork will turn into a hard fork, and migrations into Aurora will start to look like a one-way valve.

And the community will still be maintaining the on-ramp to AWS.

Let me pre-empt some of the commentary.

“Free riding is part of the deal!”

Sure it is. AWS isn’t doing anything illegal. They are just taking advantage where they can take advantage.

Similarly, the industries that dumped waste into the Cuyahoga River until it was so polluted it caught fire were no doubt working within the letter of the law.

Free riding is part of the deal.

I hope that all kinds of people and organizations free ride on my open source work, it’s part of the appeal of the work.

I also hope that enlightened self-interest at the very least will lead to one of two outcomes:

  • That the customers of AWS RDS and (particularly) Aurora recognize that the organization they are paying (AWS) is not adequately supporting the core of they software (PostgreSQL) their operations depend on, and that as a result they are implicitly taking on the associated technical risk of under-investment.
  • That AWS itself has the foresight to invest directly in the open source on-ramps to their paid cloud deployments, acknowledging that the core software does in fact provide just as much (and maybe even more) value as their impressive cloud infrastructure does.

I think it’s more likely that short term thinking will lead to “AWS only” featuritis and the creation of a one-way valve, even the deliberate downgrading of RDS and on-premise capabilities to drive customers into the arms of Aurora, because: why not?

The logic of enterprise sales is: land, expand, lock-in, and squeeze.

Just because AWS is currently in the land-and-expand phase doesn’t mean they won’t get to the succeeding phases eventually.

ArcIMS Retrieving...

The @spatialpunk asked if the old ArcIMS loading GIF was around, and I managed to find a live ArcIMS site to scavenge them from. For posterity…




…ah, the good old days.

(Who will be) America's Next Big Mapping Company?

When talking to government audiences, I like to point out that the largest national mapping agency in the world resides in a building decorated like a hypertrophic kindergarten, in Mountain View, California. This generally gets their attention.

The second most important year in my career in geospatial was 20051, the year that Google Maps debuted, and began the migration of “maps on computers” from a niche market dominated by government customers to the central pivot of multiple consumer-facing applications.

The echoes of 2005 lasted for several years.

  • Microsoft quickly realized it was dramatically behind in the space, “MapPoint” being its only spatial product, and went on an acquisition and R&D spree.
  • Esri started moving much more quickly into web technology, slavishly aping Google product direction (spinny globes?! yes! JSON on the wire!? yes! tiled basemaps?!? oh yes!) in what must have been an embarassing turn for the “industry leader” in GIS.
  • Navteq and Teleatlas transitioned quickly from industry darlings (selling data to Google!) to providers of last resort, as more nimble data gatherers took up the previously-unimagineable challenge of mapping the whole world from scratch.
  • Open source did its usual fast-follower thing, churning out a large number of Javascript-and-map-tiles web components, and pivoting existing rendering engines into tile generators.

A funny thing happened in the shake out, though. Nobody ever caught up to Google. Or even came very close. While Google has abandoned Maps Engine–their foray into “real GIS”–their commitment to core location data, knowing where everything is and what everything is, all the time and in real time, remains solid and far in advance of all competitors.

Probably even before they rolled out the iPhone in 2007 Apple knew they had a “maps problem”. Once they added a GPS chip (a development that was only awaiting a little more battery oomph) they would be beholden to Google for all the directions and maps that made the chip actually useful to phone users.

And so we eventually got the Apple maps debacle of 2012. It turns out, building a global basemap that is accurate and includes all the relevant details that users have come to expect is really really really hard.

In an excellent article on the differences between Google and Apple’s maps, Justin O’Beirne posited:

Google has gathered so much data, in so many areas, that it’s now crunching it together and creating features that Apple can’t make—surrounding Google Maps with a moat of time.

Even the most well-funded and motivated competitors cannot keep up. Microsoft came the closest early on with Bing Maps, but seems to have taken their foot off the gas, in acknowledgement of the fact that they cannot achieve parity, let alone get ahead.

And so what? So what if Google has established Google Maps as a source of location data so good and so unassailable that it’s not worth competing?

Well, it sets up an interesting future dynamic, because the competitive advantage those maps provide is too large to leave unchallenged.

  • Apple cannot afford to have their location based devices beholden to Google–a direct competitor now via Android–so they have continued to invest heavily in maps. They’ve been a lot quieter about it after the debacle, but they haven’t given up.
  • Amazon AWS is the unchallenged leader in cloud computing, but Google wants a bigger piece of the cloud computing pie, and they are using Google Maps as a wedge to pull customers into the Google Cloud ecosystem. I have heard of multiple customers who have been wooed via combined Maps + Cloud deals that offer discounted Cloud pricing on top of Maps contracts. Why not put it all in one data centre?
  • Salesforce has just purchased Tableau, pulling one of the largest BI companies into the largest enterprise cloud company. Analytics has a lower need for precise location data, but Salesforce customers will include folks who do logistics and other spatial problems requiring accurate real-time data.

Someone is going to take another run at Google, they have to. My prediction is that it will be AWS, either through acquisition (Esri? Mapbox?) or just building from scratch. There is no doubt Amazon already has some spatial smarts, since they have to solve huge logistical problems in moving goods around for the retail side, problems that require spatial quality data to solve. And there is no doubt that they do not want to let Google continue to leverage Maps against them in Cloud sales. They need a “good enough” response to help keep AWS customers on the reservation.

How will we know it’s happening? It might be hard to tell from outside the Silicon Valley bubble. Most of the prominent contributors to the New Mapping Hegemony live behind the NDA walls of organizations like Google and Apple, but they are the kinds of folks Amazon will look to poach, in addition to members of Microsoft’s Bing team (more conveniently already HQ’ed in Seattle).

I think we’re due for another mapping space race, and I’m looking forward to watching it take shape.

1. The most important event of my geospatial career was the release of the iPhone 3GS with a GPS chip as a standard component.