The most obvious difference is that the type numbers for encoding the presence of Z- and M-dimensions are not the ones described in the old OGC extension document [OGC members only, cited by Martin Daly in 2004, and extended further for PostGIS by Sandro Santilli that year] for WKB. Instead of setting high-bits to indicate the presence of Z and M, as OGC did, the ISO spec simply adds 1000. So, the ISO geometry number for a PolygonZ is 3 (Polygon) + 1000 = 1003.
The, old OGC geometry number for a PolygonZ is 3 (Polygon) | 0x80000000 = 2147483651.
OGC seems more complex until you note that the function WKB_HASZ(num) can be written (num & 0x80000000). While the ISO test is (num >= 1000 && num < 2000). Setting flags for binary values (has-z, has-m, has-a-piece-of-pie) is nice.
Anyhow, that change was well-known and expected. What I didn't expect was the amount of ambiguity surrounding the definition of an empty geometry in WKB.
To review, the spatial SQL definition includes the concept of an "empty geometry", which is an empty set of a particular geometry type. The empty geometry has more information than a simple database NULL, which is a typeless emptiness. A 'POLYGON ZM EMPTY' has an implied dimensionality. It makes some sense that ST_Intersection() of two disjoint polygons would return a 'POLYGON EMPTY'.
The ISO SQL/MM well-known text specification has clear directions for writing empty geometries of all types. In fact, I've just written two of them above: the type name plus the 'EMPTY' keyword.
For well-known binary, ISO SQL/MM includes the following useless guidance:
i) Case:Representing an empty point in WKB is hard because there's nowhere obvious to indicate the lack of ordinates. But the ISO specification makes no attempt to solve the problem, they instead provide explicit guidance that is impossible to implement. Basically, if you are reading a WKB POINT and there are doubles after the TYPE number, you have a POINT(x y). If not, you have a POINT EMPTY. All well and good, but how do you distinguish, in a collection of WKB geometries, between the presence of doubles in the byte stream and the presence of another geometry in the stream? You don't.
i) If <point binary representation> immediately contains a <wkbpoint binary>, then <point binary representation> is the well-known binary representation for an ST_Point value that is produced by <wkbpoint binary>.
ii) Otherwise, <point binary representation> produces an empty set of type ST_Point
The ISO guidance for empty Linestrings is even worse!
q) Case:As with the POINT case, the WKB reader is supposed to magically distinguish between an element of the current geometry (the <num>) in the byte-stream and an element of the next geometry in the byte-stream. And worse, the "clarifying" comment implicitly adds a whole new kind of empty geometry! What if the <num> is present, but the value is zero!?!
i) If <linestring binary representation> immediately contains <num>, then <linestring binary representation> is the well-known binary representation for an ST_LineString value. Let APA be an ST_Point ARRAY value with cardinality of <num> that contains the ST_Point values specified by the immediately contained <wkbpoint binary>s. <linestring binary representation> produces an ST_LineString value as the result of the value expression: NEW ST_LineString(APA).
ii) Otherwise, <linestring binary representation> produces an empty set of type ST_LineString.
This is where the snake starts eating its tail. The way that implementations of OGC WKB have been encoding EMPTY geometries has been to provide the type number and an element count of zero. Back when PostGIS was first getting WKB support, Dave Blasby wrestled with the fact that the specification did not describe how to encode EMPTY. Mateusz Loskot recently published some information showing the WKB EMPTY implementation that Microsoft used for SQLServer. Their implementation is one of the options Dave described five years ago – there's only so many ways to solve this problem.
If ISO didn't like the use of a zero-valued <num> count as a way of indicating EMPTY, they had another option available, which was to follow the original OGC WKB standard and use bitmask flags on their type numbers. There could have been a bitmask for Z, a bitmask for M, and a bitmask for EMPTY. There could even have been a bitmask for SRID, fixing up a huge drawback in WKB, namely that WKB does not include a slot for the SRID, which is an important element in the geometry model.
Sidenote: As a result of WKB not having SRID support, it's not possible to round-trip a geometry through WKB without losing the SRID value. Try this standard SQL and see what happens:As it stands now, the specification is out of synch with the implementations on the ground, which is bad news for the relevance of the specification. I will be implementing EMPTY using the same semantics as SQLServer, which will make the kinds of EMPTY PostGIS can represent slightly richer, but remain backwards compatible to the old schemes.SELECT ST_SRID( ST_GeomFromWKB( ST_AsBinary( ST_GeomFromText('POINT(0 0)', 4326 ) ) ) )
Then try the bastardized PostGIS EWKB format instead:SELECT ST_SRID( ST_GeomFromEWKB( ST_AsEWKB( ST_GeomFromText( 'POINT(0 0)', 4326 ) ) ) )

2 comments:
Thanks for the great summary of these issues. Like other implementers, we’ve faced the challenge of representing empty geometries in WKB, and have made largely the same compromises: writing empty linestrings as a lines with 0 points, empty points as a multipoints with 0 points, etc.. It would be great to have increased clarity in a future spec.
However this shakes out, we’re committed to reading and writing both the PostGIS flavored EWKB (with bit flags for Z, M, and SRID), as well as the standard OGC WKB. (FME uses PostGIS’ EWKB to read and write geometry in PostGIS.) Right now, we’re also working through the SQL/MM spec with a goal to support all those great new curved geometry types in PostGIS 1.4.
Paul Nalos
Safe Software
http://blog.safe.com/
That's why I wanted to keep all standard-following methods separated from core and being implemented as simple wrappers.
We can't be compatible to MULTIPLE standards at the same time.
You know what's the good thing about standards... :)
Post a Comment