Every once in a while, someone comes to me and says:
Sure, it’s handy to use ST_AsGeoJSON to convert a geometry into a JSON equivalent, but all the web clients out there like to receive full GeoJSON Features and I end up writing boilerplate to convert database rows into GeoJSON. Also, the only solution I can find on the web is scary and complex. Why don’t you have a row_to_geojson function?
And the answer (still) is that working with rows is fiddly and I don’t really feel like it.
However! It turns out that, with the tools for JSON manipulation already in PostgreSQL and a little scripting it’s possible to make a passable function to do the work.
Start with a simple table.
You can convert any row into a JSON structure using the to_jsonb() function.
That’s actually all the information we need to create a GeoJSON feature, it just needs to be re-arranged. So let’s make a little utility function to re-arrange it.
Voila! Now we can turn any relation into a proper GeoJSON “Feature” with just one(ish) function call.
You might be wondering why I made my function take in a jsonb input instead of a record, for a perfect row_to_geojson analogue to row_to_json. The answer is, the PL/PgSQL planner caches types, including the materialized types of the record parameter, on the first evaluation, which makes it impossible to use the same function for multiple tables. This is “too bad (tm)” but fortunately it is an easy workaround to just change the input to jsonb using to_json() before calling our function.
TL;DR: There are some changes in PostgresSQL 12 that FDW authors might be surprised by! Super technical, not suitable for ordinary humans.
OK, so I decided to update my two favourite extension projects (pgsql-http and pgsql-ogr-fdw) yesterday to support PostgreSQL 12 (which is the version currently under development likely to be released in the fall).
Fixing up pgsql-http was pretty easy, involving just one internal function signature change.
Fixing up pgsql-ogr-fdw involved some time in the debugger wondering what had changed.
Your Slot is Empty
When processing an FDW insert/update/delete, your code is expected to take a TupleTableSlot as input and use the data in that slot to apply the insert/update/delete operation to your backend data store, whatever that may be (OGR in my case). The data lived in the tts_values array, and the null flags in tts_isnull.
In PostgreSQL 12, the slot arrives at your ExecInsert/ExecUpdate/ExecDelete callback function empty! The tts_values array is populated with Datum values of 0, yet the tts_isnull array is full of true values. There’s no data to pass back to the FDW source.
The short-term fix is just to force the slot to populate by calling slot_getallattrs, and then go on with your usual work. That’s what I did. A more future-proof way would be to use slot_getattr and only retrieve the attributes you need (assuming you don’t just need them all).
Your VarLena might have a Short Header
Varlena types are the variable size types, like text, bytea, and varchar. Varlena types store their length and some extra information in a header. The header is potentially either 4 bytes or 1 byte long. Practically it is almost always a 4 byte header. If you call the standard VARSIZE and VARDATA macros on a varlena, the macros assume a 4 byte header.
The assumption has always held (for me), but not any more!
I found that as of PostgreSQL 12, I’m getting back varchar data with a 1-byte header! Surprise!
The fix is to stop assuming a 4-byte header. If you want the size of the varlena data area, less the header, use VARSIZE_ANY_EXHDR instead of VARSIZE() - VARHDRSZ. If you want a pointer into the data area, use VARDATA_ANY instead of VARDATA. The “any” macros first test the header type, and then apply the appropriate macro.
I have no idea what commit caused short varlena headers to make a comeback, but it was fun figuring out what the heck was going on.
Map projection is a core feature of any spatial database, taking coordinates from one coordinate system and converting them to another, and PostGIS has depended on the Proj library for coordinate reprojection support for many years.
For most of those years, the Proj library has been extremely slow moving. New projection systems might be added from time to time, and some bugs fixed, but in general it was easy to ignore. How slow was development? So slow that the version number migrated into the name, and everyone just called it “Proj4”.
No more.
Starting a couple years ago, new developers started migrating into the project, and the pace of development picked up. Proj 5 in 2018 dramatically improved the plumbing in the difficult area of geodetic transformation, and promised to begin changing the API. Only a year later, here is Proj 6, with yet more huge infrastructural improvements, and the new API.
Some of this new work was funded via the GDALBarn project, so thanks go out to those sponsors who invested in this incredibly foundational library and GDAL maintainer Even Roualt.
For PostGIS that means we have to accomodate ourselves to the new API. Doing so not only makes it easier to track future releases, but gains us access to the fancy new plumbing in Proj.
For example, Proj 6 provides:
Late-binding coordinate operation capabilities, that takes metadata such as
area of use and accuracy into account… This can avoid in a
number of situations the past requirement of using WGS84 as a pivot system,
which could cause unneeded accuracy loss.
Or, put another way: more accurate results for reprojections that involve datum shifts.
Here’s a simple example, converting from an old NAD27/NGVD29 3D coordinate with height in feet, to a new NAD83/NAVD88 coordinate with height in metres.
Note that the height in NGVD29 is 100 feet, if converted directly to meters, it would be 30.48 metres. The transformed point is:
POINT Z (-100.0004058 40.000005894 30.748549546)
Hey look! The elevation is slightly higher! That’s because in addition to being run through a horizontal NAD27/NAD83 grid shift, the point has also been run through a vertical shift grid as well. The result is a more correct interpretation of the old height measurement in the new vertical system.
Astute PostGIS users will have long noted that PostGIS contains three sources of truth for coordinate references systems (CRS).
Within the spatial_ref_sys table there are columns:
The authname, authsrid that can be used, if you have an authority database, to lookup an authsrid and get a CRS. Well, Proj 6 now ships with such a database. So there’s one source of truth.
The srtext, a string representation of a CRS, in a standard ISO format. That’s two sources.
The proj4text, the old Proj string for the CRS. Until Proj 6, this was the only form of definition that the Proj library could consume, and hence the only source of truth that mattered to PostGIS. Now, it’s a third source of truth.
Knowing this, when you ask PostGIS to transform to an SRID, what will it do?
If there are non-NULL values in authname and authsrid ask Proj to return a CRS based on those entries.
If Proj fails, and there is a non-NULL srtext ask Proj to build a CRS using that text.
If Proj still fails, and there is a non-NULL proj4text ask Proj to build a CRS using that text.
In general, the best transforms will come by having Proj look-up the CRS in its own database, because then it can apply all the power of “late binding” to ensure the best transformation for each geometry. Hence we bias in favour of Proj lookups, then the quite detailed WKT format, and finally the old Proj format.
Today’s an exciting day in the Victoria office of Crunchy Data – our local staff count goes from one to two, as Martin Davisjoins the company!
This is kind of a big deal, because this year Martin and I will be spending much or our time on the core computational geometry library that powers PostGIS, the GEOS library, and the JTS library from which it derives its structure.
Why is that a big deal? Because GEOS, JTS and other language ports provide the computational geometry algorithms underneath most of the open source geospatial ecosystem – so improvements in our core libraries ripple out to help a huge swathe of other software.
JTS came first, initially as a project of the British Columbia government. GEOS is a C++ port of JTS. There are also Javascript and .Net ports (JSTS and NTS).
Each of those libraries has developed a rich downline of other libraries and projects that depend on them. On the desktop, on the web, in the middleware, JTS and GEOS power all of it.
So we know that work on JTS and GEOS on our side is going to benefit far more than just PostGIS.
I’ve already spent a decent amount of time on bringing the GEOS library up to date with the changes in JTS over the past few months, and trying to fulfill the “maintainer” role, merging pull requests and closing some outstanding tickets.
As Martin starts adding to JTS, I now feel more confident in my ability to bring those changes into the C++ world of GEOS as they land.
Without pre-judging what will get first priority, topics of overlay robustness, predicate performance, and geometry cleaning are near the top of our list.
Our spatial customers at Crunchy process a lot of geometry, so ensuring that PostGIS (GEOS) operations are robust and high performance is a big win for PostgreSQL and for our customers as well.
How much winning is enough? Have you been winning so much that you’re tired of winning now?
I ask because last month I gave a talk (PDF Download) to the regional GIS association in Manitoba, about open source and open data. My talk included a few points that warned about the downsides of being beholden to a single software vendor, and it included some information about the software available in the open source geospatial ecosystem.
In a testament to the full-spectrum dominance of Esri, the audience was almost entirely made up of Esri customers. The event was sponsored by Esri. Esri had a table at the back of the room. Before giving my talk, I made a little joke that my talk would in fact include some digs at Esri (though I left the most pointed ones on the cutting room floor).
People seemed to like the talk. They laughed at my jokes. They nodded in the right places.
Among the points I made:
Single-vendor dominance in our field is narrowing the understanding of “what is possible” amongst practitioners in that ecosystem.
Maintaining a single-vendor policy dramatically reduces negotiating power with that vendor.
Maintaining a single-vendor policy progressively de-skills your staff, as they become dependant on a single set of tooling.
Practitioners have higher market value when they learn more than just the tools of one vendor, so self-interest dictates learning tools outside the single-vendor ecosystem.
Point’n’click GIS tools from Esri have widened access to GIS, which is a good thing, but driven down the market value of practitioners who limit themselves to those tools.
None of these points is unique to Esri – they are true of any situation where a single tool has driven competitors off the field, whether it be Adobe graphics tools or Autodesk CAD tools or Microsoft office automation tools.
Nor are any of these points indicative of any sort of ill will or malign intent on the part of Esri – they are just the systemic effects of market dominance. It is not contingent on Esri to change their behaviour or limit their success; it’s contingent on practitioners and managers to recognize the negative aspects of the situation and react accordingly.
And yet.
Despite the fact that almost all the people in the room were already their customers, that no new business would be endangered by my message, that all the students would still be taught their tools, that all the employers would still include them in job requirements, that people would continue to use the very words they choose to describe basic functions of our profession …
Despite all that, the Esri representative still went to the president of the association, complained to her about the content of my talk, and asked her to ensure that nothing I would say in my afternoon technical talk would be objectionable to himself. (In the event, I made some nasty jokes about Oracle, nobody complained.)
For some of these people, no amount of winning is enough, no position of dominance is safe, no amount of market leverage is sufficient.
It’s sad and it’s dangerous.
I was reminded of this last week, meeting an old friend in Australia and learning that he’d been blackballed out of a job for recommending software that wasn’t Esri software. Esri took away his livelihood for insufficient fealty.
This is the danger of dominance.
When the local Esri rep has a better relationship with your boss than you do, do you advocate for using alternative tools? You could be limiting or even potentially jeapardizing your career.
When Esri has locked up the local geospatial software market, do you bid an RFP with an alternative open tool set? You could lose your Esri partnership agreement and with it your ability to bid any other local contracts. Esri will make your situation clear to you
This is the danger of dominance.
A market with only one vendor is not a market. There’s a name for it, and there’s laws against it. And yet, our profession glories in it. We celebrate “GIS day”, a marketing creation of our dominant vendor. Our publicly funded colleges and universities teach whole curricula using only Esri tools.
And we, as a profession, do not protest. We smile and nod. We accept our “free” or “discounted” trainings from Esri (comes with our site license!) and our “free” or “discounted” tickets to the Esri user conference. If we are particularly oblivious, we wonder why those open source folks never come around to market their tools to us.