Nothing, Nada, Zip, Bupkus

There is nothing new under the sun, and I have been wrestling this week with writing out ISO-standard well-known binary from PostGIS.

The most obvious difference is that the type numbers for encoding the presence of Z- and M-dimensions are not the ones described in the old OGC extension document [OGC members only, cited by Martin Daly in 2004, and extended further for PostGIS by Sandro Santilli that year] for WKB. Instead of setting high-bits to indicate the presence of Z and M, as OGC did, the ISO spec simply adds 1000.

So, the ISO geometry number for a PolygonZ is 3 (Polygon) + 1000 = 1003.

The, old OGC geometry number for a PolygonZ is 3 (Polygon) | 0x80000000 = 2147483651.

OGC seems more complex until you note that the function WKB_HASZ(num) can be written (num & 0x80000000). While the ISO test is (num >= 1000 && num < 2000). Setting flags for binary values (has-z, has-m, has-a-piece-of-pie) is nice.

Anyhow, that change was well-known and expected. What I didn’t expect was the amount of ambiguity surrounding the definition of an empty geometry in WKB.

To review, the spatial SQL definition includes the concept of an “empty geometry”, which is an empty set of a particular geometry type. The empty geometry has more information than a simple database NULL, which is a typeless emptiness. A ‘POLYGON ZM EMPTY’ has an implied dimensionality. It makes some sense that ST_Intersection() of two disjoint polygons would return a ‘POLYGON EMPTY’.

The ISO SQL/MM well-known text specification has clear directions for writing empty geometries of all types. In fact, I’ve just written two of them above: the type name plus the ‘EMPTY’ keyword.

For well-known binary, ISO SQL/MM includes the following useless guidance:

i) Case:
i) If <point binary representation> immediately contains a <wkbpoint binary>, then <point binary representation> is the well-known binary representation for an ST_Point value that is produced by <wkbpoint binary>.
ii) Otherwise, <point binary representation> produces an empty set of type ST_Point

Representing an empty point in WKB is hard because there’s nowhere obvious to indicate the lack of ordinates. But the ISO specification makes no attempt to solve the problem, they instead provide explicit guidance that is impossible to implement. Basically, if you are reading a WKB POINT and there are doubles after the TYPE number, you have a POINT(x y). If not, you have a POINT EMPTY. All well and good, but how do you distinguish, in a collection of WKB geometries, between the presence of doubles in the byte stream and the presence of another geometry in the stream? You don’t.

The ISO guidance for empty Linestrings is even worse!

q) Case:
i) If <linestring binary representation> immediately contains <num>, then <linestring binary representation> is the well-known binary representation for an ST_LineString value. Let APA be an ST_Point ARRAY value with cardinality of <num> that contains the ST_Point values specified by the immediately contained <wkbpoint binary>s. <linestring binary representation> produces an ST_LineString value as the result of the value expression: NEW ST_LineString(APA).
ii) Otherwise, <linestring binary representation> produces an empty set of type ST_LineString.

As with the POINT case, the WKB reader is supposed to magically distinguish between an element of the current geometry (the <num>) in the byte-stream and an element of the next geometry in the byte-stream. And worse, the “clarifying” comment implicitly adds a whole new kind of empty geometry! What if the <num> is present, but the value is zero!?!

This is where the snake starts eating its tail. The way that implementations of OGC WKB have been encoding EMPTY geometries has been to provide the type number and an element count of zero. Back when PostGIS was first getting WKB support, Dave Blasby wrestled with the fact that the specification did not describe how to encode EMPTY. Mateusz Loskot recently published some information showing the WKB EMPTY implementation that Microsoft used for SQLServer. Their implementation is one of the options Dave described five years ago – there’s only so many ways to solve this problem.

If ISO didn’t like the use of a zero-valued <num> count as a way of indicating EMPTY, they had another option available, which was to follow the original OGC WKB standard and use bitmask flags on their type numbers. There could have been a bitmask for Z, a bitmask for M, and a bitmask for EMPTY. There could even have been a bitmask for SRID, fixing up a huge drawback in WKB, namely that WKB does not include a slot for the SRID, which is an important element in the geometry model.

Sidenote: As a result of WKB not having SRID support, it’s not possible to round-trip a geometry through WKB without losing the SRID value. Try this standard SQL and see what happens:

SELECT ST_SRID( 
 ST_GeomFromWKB(  
  ST_AsBinary( 
   ST_GeomFromText('POINT(0 0)', 4326) 
 )))

Then try the bastardized PostGIS EWKB format instead:

SELECT ST_SRID( 
 ST_GeomFromEWKB( 
  ST_AsEWKB( 
   ST_GeomFromText( 'POINT(0 0)', 4326)
 )))

As it stands now, the specification is out of synch with the implementations on the ground, which is bad news for the relevance of the specification. I will be implementing EMPTY using the same semantics as SQLServer, which will make the kinds of EMPTY PostGIS can represent slightly richer, but remain backwards compatible to the old schemes.

NYC Sprint: Day 4

End day was, as last year, a quiet one. Everyone worked on cleaning up their last bits of work before heading home around mid-afternoon. Unfortunately, most left before I had a chance to ask what they finished up!

(For my part, I finished some minimal regression tests on the WKT emitter, changed the emitter to use a stringbuffer for output, and upgraded the stringbuffer to be smarter about performance. Jeff Adams has been working on making our unit test program a little more flexible for developers (easier to add tests, and able to optionally run one test at a time)).

One more time, thanks to the sponsors, we got a great deal done and MapServer, Geoserver, PostGIS and the rest of the tribe are going to be stronger next year thanks to this event.

Special thanks also to Temim Fruchter of OpenPlans who was invaluable in helping me make arrangements in New York for hotels and food and all the things that made the event enjoyable and comfortable. Thanks to OpenPlans for hosting us in their lovely penthouse event room!

NYC Sprint: Day 3

Third day, best day. There are four MapServer developers now working hard on implementing the rendering plugin changes. Thomas Bonfort is doing the core work, Steve Lime is re-working the old GD renderer, Dan Morissette is creating support for hatched styles, and Assefa is doing KML output.

In Geoserver land, Andrea Aime added support for variable substitution in SLD, which means that URL parameters can now be passed into SLD styling rules, to create dynamic styling effects. Tim Schaub and Justin Deoliveira also demonstrated an application that warms the cockles of my heart: using their new GeoScript extension they made a web-based application that takes in SQL and spits out maps. So now I can type PostGIS queries into a web page and see the results overlaid on a map. Crunchy!

Geoserver / OpenLayers Crew

Howard Butler has contributed some work on the auto-projection support in MapServer and is now working on LibLAS Oracle support. He also tracked down an excellent pastrami sandwich. So I am told.

In PostGIS world, Jeff Adams finished his lat/lon formatter and logged his first commit: an impressive complete collection of unit tests, documentation and a working function (ST_AsLatLonText) that can turn POINT(-120.5 12.25) into 12°15’0”N 120°30’0”W. Oliver continues to fix up the text output functions. And I completed my first cut of the WKT output. Curve support really adds a lot of overhead to these things! There are lots of variants and curves have more and sillier formatting rules than linear features. David Zwarg has continued beavering through tickets in the WKTRaster subsystem.

Thanks again to our sponsors, tonight we are heading out to dinner at a Malaysian restaurant in Chinatown.

NYC Sprint: Day 2

Today the Geoserver, MapServer and OpenLayers teams got together and really showed off the “making things work together better” theme. Starting from Andrea Aime’s new “WMS rotation” feature, Daniel Morissette implemented the same feature with the same semantics in MapServer, while Andreas Hocevar implemented client support for the feature into OpenLayers. During testing, a small bug showed up in the Geoserver implementation, which Andrea fixed.

The final result is shown in this screenshot: layers from Geoserver and MapServer viewed and rotated together within OpenLayers.

screenshot008tm

The MapServer team is now really digging into a couple major goals: reading projection information automatically from data sources; and, making the rendering subsystem pluggable. Here they are deep in discussion on the rendering design:

MapServer Conclave

Automatic projection reading will make it easier for new users to work with data in mixed projections, because they will no longer have to manually populate PROJECTION blocks for their layers. They will be able to just set PROJECTION to AUTO.

The pluggable rendering upgrade is mostly of interest to programmers, but because it will clean up the plumbing in drawing maps, it will allow new features like KML, PDF and SVG output to be added much more easily. In fact, Assefa is currently working on KML output, using the new rendering design.

Keeping to himself quietly, Alan Boudreault is tying the XML Mapfile more tightly into MapServer. In the last revision, XML Mapfiles could be transformed to .map format with a utility. In the next revision, it will be possible to use them directly from the MapServer CGI program.

Over on the Geoserver side, Andrea Aime arrived from Italy last night, and today is adding the WMS GetStyles operation to Geoserver. Andreas Hocevar is working on tying together printing support in Geoserver and GeoExt. The scripting engine crew continues to beaver away on Javascript/Python/Scala scripting in Geoserver.

In PostGIS land, Olivier Courtin has blasted a pile of changes into PostGIS trunk, working on moving GML/KML/SVG support down into the core geometry library, where they will be easier to reuse. On a similar tack, I’ve been working on writing a new WKT emitter, that is easier to maintain and supports ISO WKT for extended types and dimensions.

Thanks again to our sponsors! In an hour we will be settling in to watch the Canada/USA hockey game on the big screen in the Open Plans penthouse. Go Canada!

Update: Canada disappoints! I blame Brodeur.

NYC Sprint: Day 1

On a bright day in NYC, we all convened in the sunny Open Plans event room for the first day. As in last years sprint, the morning was spent in planning and discussions, and the afternoon folks began digging in.

The MapServer team talked about release plans for 6.0, and came up with an ambitious release plan. They recognize that not every item on the plan will make the final cut, but hope that most will find either funded or community effort to bring about. Among the highlights (to me):

Pluggable renderer would allow a much cleaner rendering chain, and new renderers for new formats to be more easily added. filterObj to enhance the power of MapServer querying and support OGC Filter fully (and incidentally leverage the power of databases like PostGIS more fully). Named styles to allow re-use of style objects through a map file, instead of repeating the definitions over and over.

I also talked with Steve Lime and Jim Klassen about a bug in the one-pass rendering code that is making complex WFS queries fail. We think we have a solution and Assefa is doing the final tests.

The PostGIS discussions were about our 2.0 roadmap and what the implications of various changes are. Unfortunately, most of my proposed/desired changes are predicated are a large change to the underlying data serialization, so going forward requires a good deal of bravery – I have to burn down the village in order to save it. Olivier Courtin is working on more tractable new features: a polyhedral surface, suitable for storing 3D buildings and other objects that have grown increasingly common.

David Zwarg and Jeff Adams from Avencia joined the PostGIS group, and are working hard already: David on WKTRaster and Jeff on a geographic coordinates formatting routine. Don’t tell Jeff, but if he gets the output formatter working, I’m just going to ask him to try and write an ingester.

Justin Deoliveira and Tim Schaub began working on improving the scripting extensions to Geoserver, Tim working on the server-side JavaScript and Justin working on the Python (and David Winslow working on Scala!). Andreas Hocevar has begun a Google Maps V3 API layer for OpenLayers, which will allow Google layers without API keys in OpenLayers (yay!).

As usual, the team had to be driven into the night bodily at the end of the day – it is hard to pry nerds from their code.

Thanks to our 2010 sprint sponsors, for keeping us well supplied with food, drinks and coffee throughout this busy week!

Postscript: At the end of the day, Olivier and I settled on an order-of-operations to move towards the new database serialization in PostGIS. Step one, remove the current places in the code where the serialization pokes up into function code and get it completely isolated underneath a serialize/deserialize layer.