PostGIS Polygon Splitting

One of the joys of geospatial processing is the variety of tools in the tool box, and the ways that putting them together can yield surprising results. I have been in the guts of PostGIS for so long that I tend to think in terms of primitives: either there’s a function that does what you want or there isn’t. I’m too quick to ignore the power of combining the parts that we already have.

A community member on the users list asked (paraphrased): “is there a way to split a polygon into sub-polygons of more-or-less equal areas?”

I didn’t see the question, which is lucky, because I would have said: “No, you’re SOL, we don’t have a good way to solve that problem.” (An exact algorithm showed up in the Twitter thread about this solution, and maybe I should implement that.)

PostGIS developer Darafei Praliaskouski did answer, and provided a working solution that is absolutely brilliant in combining the parts of the PostGIS toolkit to solve a pretty tricky problem. He said:

The way I see it, for any kind of polygon:

  • Convert a polygon to a set of points proportional to the area by ST_GeneratePoints (the more points, the more beautiful it will be, guess 1000 is ok);
  • Decide how many parts you’d like to split into, (ST_Area(geom)/max_area), let it be K;
  • Take KMeans of the point cloud with K clusters;
  • For each cluster, take a ST_Centroid(ST_Collect(point));
  • Feed these centroids into ST_VoronoiPolygons, that will get you a mask for each part of polygon;
  • ST_Intersection of original polygon and each cell of Voronoi polygons will get you a good split of your polygon into K parts.

Let’s take it one step at a time to see how it works.

We’ll use Peru as the example polygon, it’s got a nice concavity to it which makes it a little trickier than an average shape.

CREATE TABLE peru AS 
  SELECT *
  FROM countries
  WHERE name = 'Peru'

Original Polygon (Petu)

Now create a point field that fills the polygon. On average, each randomly placed point ends up “occupying” an equal area within the polygon.

CREATE TABLE peru_pts AS
  SELECT (ST_Dump(ST_GeneratePoints(geom, 2000))).geom AS geom
  FROM peru
  WHERE name = 'Peru'

2000 Random Points

Now, cluster the point field, setting the number of clusters to the number of pieces you want the polygon divided into. Visually, you can now see the divisions in the polygon! But, we still need to get actual lines to represent those divisions.

CREATE TABLE peru_pts_clustered AS
  SELECT geom, ST_ClusterKMeans(geom, 10) over () AS cluster
  FROM peru_pts;

10 Clusters

Using a point field and K-means clustering to get the split areas was inspired enough. The steps to get actual polygons are equally inspired.

First, calculate the centroid of each point cluster, which will be the center of mass for each cluster.

CREATE TABLE peru_centers AS
  SELECT cluster, ST_Centroid(ST_collect(geom)) AS geom
  FROM peru_pts_clustered
  GROUP BY cluster;

Centroids of Clusters

Now, use a voronoi diagram to get actual dividing edges between the cluster centroids, which end up closely matching the places where the clusters divide!

CREATE TABLE peru_voronoi AS
  SELECT (ST_Dump(ST_VoronoiPolygons(ST_collect(geom)))).geom AS geom
  FROM peru_centers;

Voronoi of Centrois

Finally, intersect the voronoi areas with the original polygon to get final output polygons that incorporate both the outer edges of the polgyon and the voronoi dividing lines.

CREATE TABLE peru_divided AS
  SELECT ST_Intersection(a.geom, b.geom) AS geom
  FROM peru a
  CROSS JOIN peru_voronoi b;

Intersection with Original Polygon

Done!

Clustering a point field to get mostly equal areas, and then using the voronoi to extract actual dividing lines are wonderful insights into spatial processing. The final picture of all the components of the calculation is also beautiful.

All the Components Together

I’m not 100% sure, but it might be possible to use Darafei’s technique for even more interesting subdivisions, like “map of the USA subdivided into areas of equal GDP”, or “map of New York subdivided into areas of equal population” by generating the initial point field using an economic or demographic weighting.

PostGIS Talks @ FOSS4G North America

I presented my “PostGIS for Managers” talk for the last time (at least in this form) today at FOSS4G North America. The reason it’s for the last time is that the central conceit it’s built around, that a core decision is between a proprietary and an open source database, isn’t really operative anymore. The real decisions are now being driven by other considerations, like cloud platforms, and the services available there. So, it’s not really PostgreSQL versus Oracle anymore.

I also presented my “SQL Festival” talk, for the first time! New material is always a little challenging: will it work, is it the right level for the audience? It seemed to be well received, a mix of generic SQL tidbits, and some specific interesting queries you can do with PostGIS.

Changing the Direction of Government IT

I love that the UK government review of IT outsourcing deals has been nicknamed the “Ocean Liner report”, in reference to the amount of time it takes to change the direction of a large vessel.

The UK is hardly alone in having dug a deep hole of contractual obligation and learned helplessness, but they were among the first to publicly acknowledge their problem and start work on fixing it. So the rest of us watch with great interest, and try to formulate our own plans.

This afternoon I was invited to talk about technology directions to an audience of technical architects in the BC government. It was a great group, and they seemed raring to go, by and large.

Changing the Direction of Government IT

I titled my talk “Let’s Get Small” in homage to my favourite comedian, Steve Martin.

Our collective IT ocean liners will be slow to change course, but step one is turning the wheel.

Open Source for/by Government

Update: Barcelona is going all-open. Sounds extreme, but some times you’ve got to…

“You’ve got to spend money to make money”, I once confidently told a business associate, on the occasion of paying him a thousand dollars to manually clean some terrible data for me. In the event, I was right: that cleaned data paid for itself 10 times over in the following years.

I’m still the only person with a GIS file for 1996 BC elections results by voting area, and the jealousy is killing you.

Governments can play the game too, but it seems like they all end up tilling the same landscape. There’s no shortage of governments trying to create their own Silicon Valley clusters, usually through the mechanisms of subsidizing venture capital funding (via tax breaks or directly) and increased spending on R&D grants to academia. Spending money to “hopefully” make money.

There’s an under-recognized niche available, for a government willing to go after it.

Venture capitalists are (understandably) interested in having their investments create “intellectual property”, that can be patented and monopolized for outsized profits. By following the VC model of focussing on IP formation, governments are missing out on another investment avenue: the formation of “intellectual capital” in their jurisdictions.

VCs don’t like intellectual capital because it’s too mobile. It lives between the ears of employees, who can change employers too easily, and require expensive golden handcuffs to lock into place. They can monetize intellectual property in an acquisition or public offering, but they cannot monetize intellectual capital.

Governments, on the other hand, understand that by investing in universities and colleges, they are creating intellectual capital that will tend to stick around in their jurisdictions (for all the public wailing about “brain drain”, the fact is that people don’t move around all that much).

Open Source for/by Government

Investment in open source technology is a potential gold mine for creating intellectual capital, but governments have been steadfastly ignoring it for years. There is also a big first mover advantage waiting for the first governments to get into the game:

  • Instead of “buying off-the-shelf” for government information systems, build on existing OSS, or start OSS from scratch, using local talent (in-house or private sector).
  • Deliberately build with enough generality to allow use in other jurisdictions.
  • Become the first reference customer for the project. Send your local talent out to evangelize it. Encourage them to commercialize their support and services.
  • Wash, rinse, repeat.

Is this risky? Yes. Will it result in some failed projects? Yes. Will it be more expensive than the “safe” alternative? Sometimes yes, sometimes no. Will it result in increased revenues flowing into your jurisdiction? Almost certainly, if committed to and carried out across a number of projects.

When the first library in BC adopted the Evergreen open source library software, they probably weren’t envisioning a Canada-wide open source cooperative, bringing support and consulting dollars into the province, but that’s what they did, by accident. When the Atlanta Public Library started the project, they probably weren’t thinking a local company would end up selling support and expertise on the software around the country.

There is no IP moat around open source projects, but there is a first mover advantage to having a critical mass of developers and professionals who have amassed intellectual and social capital around the project.

Intellectual capital isn’t just built in universities, and the private sector shouldn’t only be looked to for intellectual property. Let’s mix it up a little.

The BC government spends $9M/year on Oracle “maintenance”, basically the right to access bug fixes and updates from Oracle for the software we’re running. It’s not a lot of money, but it’s money being shipped straight over the border. Affilias, the “.org” top level DNS provider built their infrastructure on PostgreSQL – they spend a couple hundred thousand a year having some PostgreSQL core developers on staff. Same effect, different path.

PostGIS Scaling

Earlier this month I got to speak at the Spatial Data Science Conference hosted by my employer Carto at our funky warehouse offices in Bushwick, Brooklyn. The topic: PostGIS scaling.

PostGIS Scaling

Now.

“Make it go faster” is a hard request to respond to in the generic: what is “it”, what are you doing with “it”, are you sure that your performance isn’t already excellent but you’re just too damned demanding?

So, the talk covers a number of routes to better performance: changing up query patterns, adding special PostgreSQL extensions, leaning on new features of PostgreSQL, and just plain old waiting for PostgreSQL to get better. Which it does, every release.