WKB EMPTY

03 Feb 2025

I have been watching the codification of spatial data types into GeoParquet and now GeoIceberg with some interest, since the work is near and dear to my heart.

Writing a disk serialization for PostGIS is basically an act of format standardization – albeit a standard with only one consumer – and many of the same issues that the Parquet and Iceberg implementations are thinking about are ones I dealt with too.

Here is an easy one: if you are going to use well-known binary for your serialiation (as GeoPackage, and GeoParquet do) you have to wrestle with the fact that the ISO/OGC standard for WKB does not describe a standard way to represent empty geometries.

Empty

Empty geometries come up frequently in the OGC/ISO standards, and they are simple to generate in real operations – just subtract a big thing from a small thing.

SELECT ST_AsText(ST_Difference(
	'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))',
	'POLYGON((-1 -1, 3 -1, 3 3, -1 3, -1 -1))'
	))

If you have a data set and are running operations on it, eventually you will generate some empties.

Which means your software needs to know how to store and transmit them.

Which means you need to know how to encode them in WKB.

And the standard is no help.

But I am!

WKB Commonalities

All WKB geometries start with 1-byte “byte order flag” followed by a 4-byte “geometry type”.

enum wkbByteOrder  {
    wkbXDR = 0, // Big Endian
    wkbNDR = 1  // Little Endian
};

The byte order flag signals which “byte order” all the other numbers will be encoded with. Most modern hardware uses “least significant byte first” (aka “little endian”) ordering, so usually the value will be “1”, but readers must expect to occasionally get “big endian” encoded data.

enum wkbGeometryType {
    wkbPoint = 1,
    wkbLineString = 2,
    wkbPolygon = 3,
    wkbMultiPoint = 4,
    wkbMultiLineString = 5,
    wkbMultiPolygon = 6,
    wkbGeometryCollection = 7
};

The type number is an integer from 1 to 7, in the indicated byte order.

Collections

Collections are easy! GeometryCollection, MultiPolygon, MultiLineString and MultiPoint all have a WKB structure like this:

wkbCollection {
    byte    byteOrder;
    uint32  wkbType;
    uint32  numWkbSubGeometries;
    WKBGeometry wkbSubGeometries[numWkbSubGeometries];
}

The way to signal an empty collection is to set its numGeometries value to zero.

So for example, a MULTIPOLYGON EMPTY would look like this (all examples in little endian, spaces added between elements for legibility, using hex encoding).

01 06000000 00000000

The elements are:

The byte order flag
The geometry type (6 == MultiPolygon)
The number of sub-geometries (zero)

Polygons and LineStrings

The Polygon and LineString types are also very easy, because after their type number they both have a count of sub-objects (rings in the case of Polygon, points in the case of LineString) which can be set to zero to indicate an empty geometry.

For a LineString:

01 02000000 00000000

For a Polygon:

01 03000000 00000000

It is possible to create a Polygon made up of a non-zero number of empty linear rings. Is this construction empty? Probably. Should you make one of them? Probably not, since POLYGON EMPTY describes the case much more simply.

Points

Saving the best for last!

One of the strange blind spots of the ISO/OGC standards is the WKB Point. There is a standard text representation for an empty point, POINT EMPTY. But nowhere in the standard is there a description of a binary empty point, and the WKB structure of a point doesn’t really leave any place to hide one.

WKBPoint {
    byte    byteOrder;
    uint32  wkbType; // 1
    double x;
    double y;
};

After the standard byte order flag and type number, the serialization goes directly into the coordinates. There’s no place to put in a zero.

In PostGIS we established our own add-on to the WKB standard, so we could successfully round-trip a POINT EMPTY through WKB – empty points are to be represented as a point with all coordinates set to the IEEE NaN value.

Here is a little-endian empty point.

01 01000000 000000000000F87F 000000000000F87F

And a big-endian one.

00 00000001 7FF8000000000000 7FF8000000000000

Most open source implementations of WKB have converged on this standardization of POINT EMPTY. The most common alternate behaviour is to convert POINT EMPTY object, which are not representable, into MULTIPOINT EMPTY objects, which are. This might be confusing (an empty point would round-trip back to something with a completely different type number).

In general, empty geometries create a lot of “angels dancing on the head of a pin” cases for functions that otherwise have very deterministic results.

“What is the distance in meters between a point and an empty polygon?” Zero? Infinity? NULL? NaN?
“What geometry type is the interesection of an empty polygon and empty line?” Do I care? I do if I am writing a database system and have to provide an answer.

Over time the PostGIS project collated our intuitions and implementations in this wiki page of empty geometry handling rules.

The trouble with empty handling is that there are simultaneously a million different combinations of possibilities, and extremely low numbers of people actually exercising that code line. So it’s a massive time suck. We have basically been handling them on an “as needed” basis, as people open tickets on them.

Other Databases

SQL Server changes POINT EMPTY to MULTIPOINT EMPTY when generating WKB.

SELECT Geometry::STGeomFromText('POINT EMPTY',4326).STAsBinary()

0x010400000000000000

MariaDB and SnowFlake return NULL for a POINT EMPTY WKB.

SELECT ST_AsBinary(ST_GeomFromText('POINT EMPTY'))

NULL

Paul Ramsey

WKB EMPTY

WKB Commonalities

Collections

Polygons and LineStrings

Points

Other Databases

Recent Posts

BC IT Outsourcing 2023/24 10 Mar 2025

Cancer 13 09 Mar 2025

Book Pairings 22 Feb 2025