19 Aug 2019
With PostGIS 3.0, it is now possible to generate GeoJSON features directly without any intermediate code, using the new ST_AsGeoJSON(record) function.
The GeoJSON format is a common transport format, between servers and web clients, and even between components of processing chains. Being able to create useful GeoJSON is important for integrating different parts in a modern geoprocessing application.
PostGIS has had an ST_AsGeoJSON(geometry)
for forever, but it does slightly less than most users really need: it takes in a PostGIS geometry, and outputs a GeoJSON “geometry object”.
The GeoJSON geometry object is just the shape of the feature, it doesn’t include any of the other information about the feature that might be included in the table or query. As a result, developers have spent a lot of time writing boiler-plate code to wrap the results of ST_AsGeoJSON(geometry)
up with the columns of a result tuple to create GeoJSON “feature objects”.
The ST_AsGeoJSON(record)
function looks at the input tuple, and takes the first column of type geometry to convert into a GeoJSON geometry. The rest of the columns are added to the GeoJSON features in the properties member.
SELECT ST_AsGeoJSON(subq.*) AS geojson
FROM (
SELECT ST_Centroid(geom), type, admin
FROM countries
WHERE name = 'Canada'
) AS subq
{"type": "Feature",
"geometry": {
"type":"Point",
"coordinates":[-98.2939042718784,61.3764628013483]
},
"properties": {
"type": "Sovereign country",
"admin": "Canada"
}
}
Using GeoJSON output it’s easy to stream features directly from the database into an OpenLayers or Leaflet web map, or to consume them with ogr2ogr for conversion to other geospatial formats.
16 Aug 2019
The years keep on ticking by, but the numbers never get any smaller. The contracts are written for decades at a time, and change is hard so…?
In the UK, the review of IT outsourcing contracts was called the “Ocean Liner report”, with reference to how long it takes to turn one around.
That said, the curve of BC outsourced IT spending continues to march upward.
To what do we owe the continued growth? This year, no one vendor can really claim the lion’s share of growth, and yet every vendor except Maximus notched some modest growth. I would expect the $79M Maximus revenue line to shrink pretty quickly over the next couple years, as the MSP billing system winds down.
Maximus revenue won’t go to zero though. Having built out the infrastructure for collecting MSP premiums, they took that expertise and won contracts to collect court ordered family support payments. That vendor relationship got so intertwined with the Ministry that the Auditor General looked askance at it. They also used their position in the data flow to ramp up billings by taking on more responsibilities and adding to their systems.
This is one of the reasons that billings tend to go up-and-up even though outsourcing contracts are usually justified in the name of cost control and predictability.
From a vendor point of view, the beauty of an outsourcing agreement is that it lands you into the heart of a business, providing a privileged view of the opportunity space. There is nothing a client likes more (assuming they have the budget) then a vendor that can pro-actively provide a proposal to fix an obvious business problem. The fact that the visibility into those problems is monopolized by the incumbent vendor is just an unfortunate side effect. Over time, these little proposals can grow into whole new lines of outsourced work. If there is no internal capacity to support the new programs, the vendor just increases its share of government revenues.
The UK is as big and complex as BC (more so, of course) and they have reviewed their major outsourcing contracts, and exited a few. Iain Patterson, who wrote the “Ocean Liner” report, saw incumbent vendors, and their bias towards the (lucrative) status quo as a barrier to modernization:
Unfortunately, the main barrier that prevents departments from investing in these solutions is the contract landscape. Many still have large, legacy contracts using system integrators which affect their ability to change their technology estate. They’re faced with costly change control requests and complicated workarounds to link up cloud-based commodity solutions with their existing technology.
Unfortunately, there is no one answer to how to get out of the bind, except “very slowly and carefully”. As Patterson said:
“Understand the capabilities you have,” he said. “Understand what you’ve got as far as your contractual commitments. When the contracts expire, work with the companies that hold those contracts to try and change that landscape during the lifetime of the contract.
“And if you can’t do that, then we will try to do things in a different way. But we’ll come out of those contracts understanding what it is that we have in them at that moment in time, what can we commoditise in those, and where they are best suited to be delivered from, whether that’s internal, external, or whether there are things we can buy now rather than build.”
Turn that ocean liner, a little bit at a time.
31 Jul 2019
And, late on a Friday afternoon, the plaintive cry was heard!
And indeed, into the sea they do go!
And ‘lo, the SQL faeries were curious, and gave it a shot!
##### Commandline OSX/Linux #####
# Get the Shape files
# http://www.elections.bc.ca/index.php/voting/electoral-maps-profiles/
wget http://www.elections.bc.ca/docs/map/redis08/GIS/ED_Province.exe
# Exe? No prob, it's actually a self-extracting ZIP
unzip ED_Province
# Get a PostGIS database ready for the data
createdb ed_clip
psql -c "create extension postgis" -d ed_clip
# Load into PostGIS
# The .prj says it is "Canada Albers Equal Area", but they
# lie! It's actually BC Albers, EPSG:3005
shp2pgsql -s 3005 -i -I ED_Province ed | psql -d ed_clip
# We need some ocean! Use Natural Earth...
# http://www.naturalearthdata.com/downloads/
wget http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/10m/physical/ne_10m_ocean.zip
unzip ne_10m_ocean.zip
# Load the Ocean into PostGIS!
shp2pgsql -s 4326 -i -I ne_10m_ocean ocean | psql -d ed_clip
# OK, now we connect to PostGIS and start working in SQL
psql -e ed_clip
-- How big is the Ocean table?
SELECT Count(*) FROM ocean;
-- Oh, only 1 polygon. Well, that makes it easy...
-- For each electoral district, we want to difference away the ocean.
-- The ocean is a one big polygon, this will take a while (if we
-- were being more subtle, we'd first clip the ocean down to
-- a reasonable area around BC.)
CREATE TABLE ed_clipped AS
SELECT
CASE
WHEN ST_Intersects(o.geom, ST_Transform(e.geom,4326))
THEN ST_Difference(ST_Transform(e.geom,4326), o.geom)
ELSE ST_Transform(e.geom,4326)
END AS geom,
e.edabbr,
e.edname
FROM ed e, ocean o;
-- Check our geometry types...
SELECT DISTINCT ST_GeometryType(geom) FROM ed_clipped;
-- Oh, they are heterogeneous. Let's force them all multi
UPDATE ed_clipped SET geom = ST_Multi(geom);
# Dump the result out of the database back into shapes
pgsql2shp -f ed2009_ocean ed_clip ed_clipped
zip ed2009_ocean.zip ed2009_ocean.*
mv ed2009_ocean.zip ~/Dropbox/Public/
No more districts in oceans!
And the faeries were happy, and uploaded their polygons!
Update: And the lamentations ended, and the faeries also rejoiced.
30 Jul 2019
One question that comes up often during our PostGIS training is “how do I do an overlay?” The terminology can vary: sometimes they call the operation a “union” sometimes an “intersect”. What they mean is, “can you turn a collection of overlapping polygons into a collection of non-overlapping polygons that retain information about the overlapping polygons that formed them?”
So an overlapping set of three circles becomes a non-overlapping set of 7 polygons.
Calculating the overlapping parts of a pair of shapes is easy, using the ST_Intersection() function in PostGIS, but that only works for pairs, and doesn’t capture the areas that have no overlaps at all.
How can we handle multiple overlaps and get out a polygon set that covers 100% of the area of the input sets? By taking the polygon geometry apart into lines, and then building new polygons back up.
Let’s construct a synthetic example: first, generate a collection of random points, using a Gaussian distribution, so there’s more overlap in the middle. The crazy math in the SQL below just converts the uniform random numbers from the random()
function into normally distributed numbers.
CREATE TABLE pts AS
WITH rands AS (
SELECT generate_series as id,
random() AS u1,
random() AS u2
FROM generate_series(1,100)
)
SELECT
id,
ST_SetSRID(ST_MakePoint(
50 * sqrt(-2 * ln(u1)) * cos(2*pi()*u2),
50 * sqrt(-2 * ln(u1)) * sin(2*pi()*u2)), 4326) AS geom
FROM rands;
The result looks like this:
Now, we turn the points into circles, big enough to have overlaps.
CREATE TABLE circles AS
SELECT id, ST_Buffer(geom, 10) AS geom
FROM pts;
Which looks like this:
Now it’s time to take the polygons apart. In this case we’ll take the exterior ring of the circles, using ST_ExteriorRing(). If we were dealing with complex polygons with holes, we’d have to use ST_DumpRings(). Once we have the rings, we want to make sure that everywhere rings cross the lines are broken, so that no lines cross, they only touch at their end points. We do that with the ST_Union() function.
CREATE TABLE boundaries AS
SELECT ST_Union(ST_ExteriorRing(geom)) AS geom
FROM circles;
What comes out is just lines, but with end points at every crossing.
Now that we have noded lines, we can collect them into a multi-linestring and feed them to ST_Polygonize() to generate polygons. The polygons come out as one big multi-polygon, so we’ll use ST_Dump() to convert it into a table with one row per polygon.
CREATE SEQUENCE polyseq;
CREATE TABLE polys AS
SELECT nextval('polyseq') AS id,
(ST_Dump(ST_Polygonize(geom))).geom AS geom
FROM boundaries;
Now we have a set of polygons with no overlaps, only one polygon per area.
So, how do we figure out how many overlaps contributed to each incoming polygon? We can join the centroids of the new small polygons with the set of original circles, and calculate how many circles contain each centroid point.
A spatial join will allow us to calculate the number of overlaps.
ALTER TABLE polys ADD COLUMN count INTEGER DEFAULT 0;
UPDATE POLYS set count = p.count
FROM (
SELECT count(*) AS count, p.id AS id
FROM polys p
JOIN circles c
ON ST_Contains(c.geom, ST_PointOnSurface(p.geom))
GROUP BY p.id
) AS p
WHERE p.id = polys.id;
That’s it! Now we have a single coverage of the area, where each polygon knows how much overlap contributed to it. Ironically, when visualized using the coverage count as a variable in the color ramp, it looks a lot like the original image, which was created with a simple transparency effect. However, the point here is that we’ve created new data, in the count attribute of the new polygon layer.
The same decompose-and-rebuild-and-join-centroids trick can be used to overlay all kinds of features, and to carry over attributes from the original input data, achieving the classic “GIS overlay” workflow. Happy geometry mashing!
22 Jul 2019
“How can I get paid to work on this cool open source project?”
Once upon a time, it felt like most discussions about open source were predicated on answering this question. Developers fell in love with, or created, some project or other, but found themselves working on it in their spare time. If only there was some way to monetize their labour of love!
From that complaint, a score of (never quite satisfactory) models were spawned.
- Pure consulting, which depends on a never-ending supply of enhancements and updates, usually to the detriment of core maintainance.
- Professional support, which depends on a deep enough market of potential enterprises, and a sense of “deployment risk” large enough to open wallets.
- Open core, which invests in the open source core to promote adoption and hopefully build a sub-population of enterprises that have embedded the project sufficiently to be in the market for add-ons to make things faster/simpler/more integrated.
- Relicensing, which leverages adoption of the open source to squeeze conventional licensing revenue out of enterprises that don’t want to accept open source license terms of use.
What all the models above have in common is that they more-or-less require successful adopters to invest effort in the open source project at the center of the model. Adoption of the open source project is an “on ramp” to revenue opportunities, but the operating assumption is that customer will continue using the open source core, so the business has an incentive to invest in the “on ramp”.
In the same way, Oracle provides an “on ramp” to their enterprise product, in the free-as-in-beer Oracle Express. Oracle pays for the development of the on-ramp (it’s just Oracle, after all) and in return (maybe) reaps the reward of eventual migration of users to their paid product.
Which brings me to “AWS Aurora”, now generally available on Amazon’s cloud.
AWS deliberately does not say Aurora is “PostgreSQL”. They say it is “PostgreSQL compatible”. That’s probably for the best: Aurora is a soft fork of the core PostgreSQL code that replaces big chunks of PostgreSQL storage logic with clever, custom AWS code.
Like AWS PostgreSQL RDS, Aurora is a revenue generating fork of PostgreSQL that uses open source PostgreSQL adoption as an on-ramp for AWS revenue. This would superficially seem to be a similar situation to all the other open source business models, except for one thing: AWS doesn’t have any stake in the success of PostgreSQL per se.
AWS offers RDS versions of all the RDBMS systems, open source and otherwise, and they invest in the core projects accordingly, and fairly: hardly at all. After all, to do otherwise would be to declare a preference.
So the open source communities end up building the on-ramp to AWS paid services as a free gift to AWS.
That’s annoying enough, but it gets uglier, I think, as time goes on.
For now, Aurora tracks the PostgreSQL version fairly well. You can move your app onto Aurora, you can move it back off to RDS, you can move it to Google’s managed PostgresSQL, you can host it yourself on premise or in the cloud.
However, eventually the business expense of maintaining the Aurora code base against a PostgreSQL baseline that is in constant motion will wear on Amazon, and they will start to see places where adding “Aurora only” features will “improve the customer experience”.
At that point, the soft fork will turn into a hard fork, and migrations into Aurora will start to look like a one-way valve.
And the community will still be maintaining the on-ramp to AWS.
Let me pre-empt some of the commentary.
“Free riding is part of the deal!”
Sure it is. AWS isn’t doing anything illegal. They are just taking advantage where they can take advantage.
Similarly, the industries that dumped waste into the Cuyahoga River until it was so polluted it caught fire were no doubt working within the letter of the law.
Free riding is part of the deal.
I hope that all kinds of people and organizations free ride on my open source work, it’s part of the appeal of the work.
I also hope that enlightened self-interest at the very least will lead to one of two outcomes:
- That the customers of AWS RDS and (particularly) Aurora recognize that the organization they are paying (AWS) is not adequately supporting the core of they software (PostgreSQL) their operations depend on, and that as a result they are implicitly taking on the associated technical risk of under-investment.
- That AWS itself has the foresight to invest directly in the open source on-ramps to their paid cloud deployments, acknowledging that the core software does in fact provide just as much (and maybe even more) value as their impressive cloud infrastructure does.
I think it’s more likely that short term thinking will lead to “AWS only” featuritis and the creation of a one-way valve, even the deliberate downgrading of RDS and on-premise capabilities to drive customers into the arms of Aurora, because: why not?
The logic of enterprise sales is: land, expand, lock-in, and squeeze.
Just because AWS is currently in the land-and-expand phase doesn’t mean they won’t get to the succeeding phases eventually.