Where 2.0 Drinking Game

It’s that time of year again, and I’ll be sitting in the audience with my flask (I hope you will too!) playing the Where 2.0 drinking game. Here’s some of my phrases, what are yours?

  • … find a Starbucks…
  • … we’re releasing an API …
  • … friends list …

Also, take a big slug if someone talks about working with geospatial data more complex than a lat/lon point!

Send us Jeremy and Keyur!

ESRI (pretty new web page, by the way) has put their open source position on-line and also produced a short podcast with Victoria Kouyoumjian on the same topic.

http://www.esri.com/news/podcasts/audio/speaker/staff_kouyoumjian.mp3

One thing that struck me in the podcast was when Victoria noted that ESRI has sponsored open source events in the past (most notably FOSS4G 2007 directly, but also 2008 and 2009 to a lesser extent through 50°North). She says,

These events allow us the opportunity to engage in conversations and dialogs with various technologists because we want to gain feedback about the needs of open source developers and users. The objective of course is to channel this information back to development so we can reflect this in future products and business decisions in order to best support our customers.

So far ESRI attendance has been at the managerial level, and while I love those guys (hugs to Satish and Victoria!) some real sparks could fly and serious interoperability improvements be made if we started seeing the developers, the project leads and software designers, at the events. We can do better than “channeling” information back to development, let’s immerse development in it!

Update: We promise to send them back. Really.

Nothing, Nada, Zip, Bupkus

There is nothing new under the sun, and I have been wrestling this week with writing out ISO-standard well-known binary from PostGIS.

The most obvious difference is that the type numbers for encoding the presence of Z- and M-dimensions are not the ones described in the old OGC extension document [OGC members only, cited by Martin Daly in 2004, and extended further for PostGIS by Sandro Santilli that year] for WKB. Instead of setting high-bits to indicate the presence of Z and M, as OGC did, the ISO spec simply adds 1000.

So, the ISO geometry number for a PolygonZ is 3 (Polygon) + 1000 = 1003.

The, old OGC geometry number for a PolygonZ is 3 (Polygon) | 0x80000000 = 2147483651.

OGC seems more complex until you note that the function WKB_HASZ(num) can be written (num & 0x80000000). While the ISO test is (num >= 1000 && num < 2000). Setting flags for binary values (has-z, has-m, has-a-piece-of-pie) is nice.

Anyhow, that change was well-known and expected. What I didn’t expect was the amount of ambiguity surrounding the definition of an empty geometry in WKB.

To review, the spatial SQL definition includes the concept of an “empty geometry”, which is an empty set of a particular geometry type. The empty geometry has more information than a simple database NULL, which is a typeless emptiness. A ‘POLYGON ZM EMPTY’ has an implied dimensionality. It makes some sense that ST_Intersection() of two disjoint polygons would return a ‘POLYGON EMPTY’.

The ISO SQL/MM well-known text specification has clear directions for writing empty geometries of all types. In fact, I’ve just written two of them above: the type name plus the ‘EMPTY’ keyword.

For well-known binary, ISO SQL/MM includes the following useless guidance:

i) Case:
i) If <point binary representation> immediately contains a <wkbpoint binary>, then <point binary representation> is the well-known binary representation for an ST_Point value that is produced by <wkbpoint binary>.
ii) Otherwise, <point binary representation> produces an empty set of type ST_Point

Representing an empty point in WKB is hard because there’s nowhere obvious to indicate the lack of ordinates. But the ISO specification makes no attempt to solve the problem, they instead provide explicit guidance that is impossible to implement. Basically, if you are reading a WKB POINT and there are doubles after the TYPE number, you have a POINT(x y). If not, you have a POINT EMPTY. All well and good, but how do you distinguish, in a collection of WKB geometries, between the presence of doubles in the byte stream and the presence of another geometry in the stream? You don’t.

The ISO guidance for empty Linestrings is even worse!

q) Case:
i) If <linestring binary representation> immediately contains <num>, then <linestring binary representation> is the well-known binary representation for an ST_LineString value. Let APA be an ST_Point ARRAY value with cardinality of <num> that contains the ST_Point values specified by the immediately contained <wkbpoint binary>s. <linestring binary representation> produces an ST_LineString value as the result of the value expression: NEW ST_LineString(APA).
ii) Otherwise, <linestring binary representation> produces an empty set of type ST_LineString.

As with the POINT case, the WKB reader is supposed to magically distinguish between an element of the current geometry (the <num>) in the byte-stream and an element of the next geometry in the byte-stream. And worse, the “clarifying” comment implicitly adds a whole new kind of empty geometry! What if the <num> is present, but the value is zero!?!

This is where the snake starts eating its tail. The way that implementations of OGC WKB have been encoding EMPTY geometries has been to provide the type number and an element count of zero. Back when PostGIS was first getting WKB support, Dave Blasby wrestled with the fact that the specification did not describe how to encode EMPTY. Mateusz Loskot recently published some information showing the WKB EMPTY implementation that Microsoft used for SQLServer. Their implementation is one of the options Dave described five years ago – there’s only so many ways to solve this problem.

If ISO didn’t like the use of a zero-valued <num> count as a way of indicating EMPTY, they had another option available, which was to follow the original OGC WKB standard and use bitmask flags on their type numbers. There could have been a bitmask for Z, a bitmask for M, and a bitmask for EMPTY. There could even have been a bitmask for SRID, fixing up a huge drawback in WKB, namely that WKB does not include a slot for the SRID, which is an important element in the geometry model.

Sidenote: As a result of WKB not having SRID support, it’s not possible to round-trip a geometry through WKB without losing the SRID value. Try this standard SQL and see what happens:

SELECT ST_SRID( 
 ST_GeomFromWKB(  
  ST_AsBinary( 
   ST_GeomFromText('POINT(0 0)', 4326) 
 )))

Then try the bastardized PostGIS EWKB format instead:

SELECT ST_SRID( 
 ST_GeomFromEWKB( 
  ST_AsEWKB( 
   ST_GeomFromText( 'POINT(0 0)', 4326)
 )))

As it stands now, the specification is out of synch with the implementations on the ground, which is bad news for the relevance of the specification. I will be implementing EMPTY using the same semantics as SQLServer, which will make the kinds of EMPTY PostGIS can represent slightly richer, but remain backwards compatible to the old schemes.

NYC Sprint: Day 4

End day was, as last year, a quiet one. Everyone worked on cleaning up their last bits of work before heading home around mid-afternoon. Unfortunately, most left before I had a chance to ask what they finished up!

(For my part, I finished some minimal regression tests on the WKT emitter, changed the emitter to use a stringbuffer for output, and upgraded the stringbuffer to be smarter about performance. Jeff Adams has been working on making our unit test program a little more flexible for developers (easier to add tests, and able to optionally run one test at a time)).

One more time, thanks to the sponsors, we got a great deal done and MapServer, Geoserver, PostGIS and the rest of the tribe are going to be stronger next year thanks to this event.

Special thanks also to Temim Fruchter of OpenPlans who was invaluable in helping me make arrangements in New York for hotels and food and all the things that made the event enjoyable and comfortable. Thanks to OpenPlans for hosting us in their lovely penthouse event room!

NYC Sprint: Day 3

Third day, best day. There are four MapServer developers now working hard on implementing the rendering plugin changes. Thomas Bonfort is doing the core work, Steve Lime is re-working the old GD renderer, Dan Morissette is creating support for hatched styles, and Assefa is doing KML output.

In Geoserver land, Andrea Aime added support for variable substitution in SLD, which means that URL parameters can now be passed into SLD styling rules, to create dynamic styling effects. Tim Schaub and Justin Deoliveira also demonstrated an application that warms the cockles of my heart: using their new GeoScript extension they made a web-based application that takes in SQL and spits out maps. So now I can type PostGIS queries into a web page and see the results overlaid on a map. Crunchy!

Geoserver / OpenLayers Crew

Howard Butler has contributed some work on the auto-projection support in MapServer and is now working on LibLAS Oracle support. He also tracked down an excellent pastrami sandwich. So I am told.

In PostGIS world, Jeff Adams finished his lat/lon formatter and logged his first commit: an impressive complete collection of unit tests, documentation and a working function (ST_AsLatLonText) that can turn POINT(-120.5 12.25) into 12°15’0”N 120°30’0”W. Oliver continues to fix up the text output functions. And I completed my first cut of the WKT output. Curve support really adds a lot of overhead to these things! There are lots of variants and curves have more and sillier formatting rules than linear features. David Zwarg has continued beavering through tickets in the WKTRaster subsystem.

Thanks again to our sponsors, tonight we are heading out to dinner at a Malaysian restaurant in Chinatown.