WKB EMPTY

I have been watching the codification of spatial data types into GeoParquet and now GeoIceberg with some interest, since the work is near and dear to my heart.

Writing a disk serialization for PostGIS is basically an act of format standardization – albeit a standard with only one consumer – and many of the same issues that the Parquet and Iceberg implementations are thinking about are ones I dealt with too.

Here is an easy one: if you are going to use well-known binary for your serialiation (as GeoPackage, and GeoParquet do) you have to wrestle with the fact that the ISO/OGC standard for WKB does not describe a standard way to represent empty geometries.

Empty

Empty geometries come up frequently in the OGC/ISO standards, and they are simple to generate in real operations – just subtract a big thing from a small thing.

SELECT ST_AsText(ST_Difference(
	'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))',
	'POLYGON((-1 -1, 3 -1, 3 3, -1 3, -1 -1))'
	))

If you have a data set and are running operations on it, eventually you will generate some empties.

Which means your software needs to know how to store and transmit them.

Which means you need to know how to encode them in WKB.

And the standard is no help.

But I am!

WKB Commonalities

All WKB geometries start with 1-byte “byte order flag” followed by a 4-byte “geometry type”.

enum wkbByteOrder  {
    wkbXDR = 0, // Big Endian
    wkbNDR = 1  // Little Endian
};

The byte order flag signals which “byte order” all the other numbers will be encoded with. Most modern hardware uses “least significant byte first” (aka “little endian”) ordering, so usually the value will be “1”, but readers must expect to occasionally get “big endian” encoded data.

enum wkbGeometryType {
    wkbPoint = 1,
    wkbLineString = 2,
    wkbPolygon = 3,
    wkbMultiPoint = 4,
    wkbMultiLineString = 5,
    wkbMultiPolygon = 6,
    wkbGeometryCollection = 7
};

The type number is an integer from 1 to 7, in the indicated byte order.

Collections

Collections are easy! GeometryCollection, MultiPolygon, MultiLineString and MultiPoint all have a WKB structure like this:

wkbCollection {
    byte    byteOrder;
    uint32  wkbType;
    uint32  numWkbSubGeometries;
    WKBGeometry wkbSubGeometries[numWkbSubGeometries];
}

The way to signal an empty collection is to set its numGeometries value to zero.

So for example, a MULTIPOLYGON EMPTY would look like this (all examples in little endian, spaces added between elements for legibility, using hex encoding).

01 06000000 00000000

The elements are:

Polygons and LineStrings

The Polygon and LineString types are also very easy, because after their type number they both have a count of sub-objects (rings in the case of Polygon, points in the case of LineString) which can be set to zero to indicate an empty geometry.

For a LineString:

01 02000000 00000000

For a Polygon:

01 03000000 00000000

It is possible to create a Polygon made up of a non-zero number of empty linear rings. Is this construction empty? Probably. Should you make one of them? Probably not, since POLYGON EMPTY describes the case much more simply.

Points

Saving the best for last!

One of the strange blind spots of the ISO/OGC standards is the WKB Point. There is an standard text representation for an empty point, POINT EMPTY. But there nowhere in the standard a description of a WKB empty point, and the WKB structure of a point doesn’t really leave any place to hide one.

WKBPoint {
    byte    byteOrder;
    uint32  wkbType; // 1
    double x;
    double y;
};

After the standard byte order flag and type number, the serialization goes directly into the coordinates. There’s no place to put in a zero.

In PostGIS we established our own add-on to the WKB standard, so we could successfully round-trip a POINT EMPTY through WKB – empty points are to be represented as a point with all coordinates set to the IEEE NaN value.

Here is a little-endian empty point.

01 01000000 000000000000F87F 000000000000F87F

And a big-endian one.

00 00000001 7FF8000000000000 7FF8000000000000

Most open source implementations of WKB have converged on this standardization of POINT EMPTY. The most common alternate behaviour is to convert POINT EMPTY object, which are not representable, into MULTIPOINT EMPTY objects, which are. This might be confusing (an empty point would round-trip back to something with a completely different type number).

In general, empty geometries create a lot of “angels dancing on the head of a pin” cases for functions that otherwise have very deterministic results.

Over time the PostGIS project collated our intuitions and implementations in this wiki page of empty geometry handling rules.

The trouble with empty handling is that there are simultaneously a million different combinations of possibilities, and extremely low numbers of people actually exercising that code line. So it’s a massive time suck. We have basically been handling them on an “as needed” basis, as people open tickets on them.

Other Databases