PostGIS Performance: Prepared Geometry

Spatial joins are a common use case in spatial databases, putting together two tables based on the spatial relationships of their geometry fields. For example:

SELECT census.income, houses.value 
FROM census JOIN houses 
ON (ST_Contains(census.geom, houses.geom))

The way this gets evaluated by the database looks something like this:

for each census polygon
  for each house point near the census geom
    run the st_contains test on that pair of geometries

Because the outer loop is driven by the census geometry, you will get repeated calls to the “contains” algorithm that have the same polygon each time. By recognizing this repetition, you can build a shortcut, that creates a smart, indexed polygon, and uses it over and over each time it is repeated in a function call.

The “smart, indexed polygon” is a “PreparedGeometry”, and the concept was implemented in JTS over a year ago. About six months ago, it was ported to GEOS (a C++ mirror of JTS), but the port still had some nagging memory leaks which made it unready for production use.

Last month, Zonar Systems, who funded the initial JTS algorithm work, asked me to bring the functionality the rest of the way out to PostGIS. We found a C++ expert who identified and removed the last GEOS leaks, and I cleaned the leaks out of the PostGIS side of the implementation.

The speed difference is impressive!

I have a test data set of two tables: one table of 80 large polygons, and another table of 8000 small polygons. Each large polygon contains about 100 small ones.

Without the prepared geometry, a spatial join using ST_Intersects takes about 40 seconds. With the prepared geometry, the join takes 8 seconds, five times faster. The larger the size difference between your tables, the larger the speed-up you see will be.

The functions effected by the PreparedGeometry upgrade are ST_Intersects(), ST_Contains(), ST_Covers() and ST_ContainsProperly().

To try out the new functionality, you’ll need to check out and compile the GEOS SVN trunk (http://svn.osgeo.org/geos/trunk) which will become GEOS 3.1.0 in a little while, and the PostGIS 1.3 SVN branch (http://svn.osgeo.org/postgis/branches/1.3), which will become PostGIS 1.3.4 shortly. First compile and install GEOS, then PostGIS, since PostGIS checks the GEOS version during the compile stage to determine whether to activate the functionality.

Major thank you to Zonar Systems for funding the initial work and then stepping up a second time to fund the clean-up and roll-out to production-ready status. Why did they do it? They run a major fleet tracking and data analysis system on PostGIS, and they need lots of speed to handle the huge data volumes generated by their real-time tracking devices.