WKB EMPTY
03 Feb 2025I have been watching the codification of spatial data types into GeoParquet and now GeoIceberg with some interest, since the work is near and dear to my heart.
Writing a disk serialization for PostGIS is basically an act of format standardization – albeit a standard with only one consumer – and many of the same issues that the Parquet and Iceberg implementations are thinking about are ones I dealt with too.
Here is an easy one: if you are going to use well-known binary for your serialiation (as GeoPackage, and GeoParquet do) you have to wrestle with the fact that the ISO/OGC standard for WKB does not describe a standard way to represent empty geometries.
Empty geometries come up frequently in the OGC/ISO standards, and they are simple to generate in real operations – just subtract a big thing from a small thing.
SELECT ST_AsText(ST_Difference(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))',
'POLYGON((-1 -1, 3 -1, 3 3, -1 3, -1 -1))'
))
If you have a data set and are running operations on it, eventually you will generate some empties.
Which means your software needs to know how to store and transmit them.
Which means you need to know how to encode them in WKB.
And the standard is no help.
But I am!
WKB Commonalities
All WKB geometries start with 1-byte “byte order flag” followed by a 4-byte “geometry type”.
enum wkbByteOrder {
wkbXDR = 0, // Big Endian
wkbNDR = 1 // Little Endian
};
The byte order flag signals which “byte order” all the other numbers will be encoded with. Most modern hardware uses “least significant byte first” (aka “little endian”) ordering, so usually the value will be “1”, but readers must expect to occasionally get “big endian” encoded data.
enum wkbGeometryType {
wkbPoint = 1,
wkbLineString = 2,
wkbPolygon = 3,
wkbMultiPoint = 4,
wkbMultiLineString = 5,
wkbMultiPolygon = 6,
wkbGeometryCollection = 7
};
The type number is an integer from 1 to 7, in the indicated byte order.
Collections
Collections are easy! GeometryCollection, MultiPolygon, MultiLineString and MultiPoint all have a WKB structure like this:
wkbCollection {
byte byteOrder;
uint32 wkbType;
uint32 numWkbSubGeometries;
WKBGeometry wkbSubGeometries[numWkbSubGeometries];
}
The way to signal an empty collection is to set its numGeometries value to zero.
So for example, a MULTIPOLYGON EMPTY
would look like this (all examples in little endian, spaces added between elements for legibility, using hex encoding).
01 06000000 00000000
The elements are:
- The byte order flag
- The geometry type (6 == MultiPolygon)
- The number of sub-geometries (zero)
Polygons and LineStrings
The Polygon and LineString types are also very easy, because after their type number they both have a count of sub-objects (rings in the case of Polygon, points in the case of LineString) which can be set to zero to indicate an empty geometry.
For a LineString:
01 02000000 00000000
For a Polygon:
01 03000000 00000000
It is possible to create a Polygon made up of a non-zero number of empty linear rings. Is this construction empty? Probably. Should you make one of them? Probably not, since POLYGON EMPTY
describes the case much more simply.
Points
Saving the best for last!
One of the strange blind spots of the ISO/OGC standards is the WKB Point. There is an standard text representation for an empty point, POINT EMPTY
. But there nowhere in the standard a description of a WKB empty point, and the WKB structure of a point doesn’t really leave any place to hide one.
WKBPoint {
byte byteOrder;
uint32 wkbType; // 1
double x;
double y;
};
After the standard byte order flag and type number, the serialization goes directly into the coordinates. There’s no place to put in a zero.
In PostGIS we established our own add-on to the WKB standard, so we could successfully round-trip a POINT EMPTY
through WKB – empty points are to be represented as a point with all coordinates set to the IEEE NaN value.
Here is a little-endian empty point.
01 01000000 000000000000F87F 000000000000F87F
And a big-endian one.
00 00000001 7FF8000000000000 7FF8000000000000
Most open source implementations of WKB have converged on this standardization of POINT EMPTY
. The most common alternate behaviour is to convert POINT EMPTY
object, which are not representable, into MULTIPOINT EMPTY
objects, which are. This might be confusing (an empty point would round-trip back to something with a completely different type number).
In general, empty geometries create a lot of “angels dancing on the head of a pin” cases for functions that otherwise have very deterministic results.
- “What is the distance in meters between a point and an empty polygon?” Zero? Infinity? NULL? NaN?
- “What geometry type is the interesection of an empty polygon and empty line?” Do I care? I do if I am writing a database system and have to provide an answer.
Over time the PostGIS project collated our intuitions and implementations in this wiki page of empty geometry handling rules.
The trouble with empty handling is that there are simultaneously a million different combinations of possibilities, and extremely low numbers of people actually exercising that code line. So it’s a massive time suck. We have basically been handling them on an “as needed” basis, as people open tickets on them.
Other Databases
- SQL Server changes
POINT EMPTY
toMULTIPOINT EMPTY
when generating WKB.SELECT Geometry::STGeomFromText('POINT EMPTY',4326).STAsBinary() 0x010400000000000000
- MariaDB and SnowFlake return NULL for a
POINT EMPTY
WKB.SELECT ST_AsBinary(ST_GeomFromText('POINT EMPTY')) NULL