Friday, October 10, 2008

PostGIS Performance: Prepared Geometry

Spatial joins are a common use case in spatial databases, putting together two tables based on the spatial relationships of their geometry fields. For example:

SELECT census.income, houses.value FROM census JOIN houses ON (ST_Contains(census.geom, houses.geom))

The way this gets evaluated by the database looks something like this:

for each census polygon
  for each house point near the census geom
    run the st_contains test on that pair of geometries


Because the outer loop is driven by the census geometry, you will get repeated calls to the "contains" algorithm that have the same polygon each time. By recognizing this repetition, you can build a shortcut, that creates a smart, indexed polygon, and uses it over and over each time it is repeated in a function call.

The "smart, indexed polygon" is a "PreparedGeometry", and the concept was implemented in JTS over a year ago. About six months ago, it was ported to GEOS (a C++ mirror of JTS), but the port still had some nagging memory leaks which made it unready for production use.

Last month, Zonar Systems, who funded the initial JTS algorithm work, asked me to bring the functionality the rest of the way out to PostGIS. We found a C++ expert who identified and removed the last GEOS leaks, and I cleaned the leaks out of the PostGIS side of the implementation.

The speed difference is impressive!

I have a test data set of two tables: one table of 80 large polygons, and another table of 8000 small polygons. Each large polygon contains about 100 small ones.

Without the prepared geometry, a spatial join using ST_Intersects takes about 40 seconds. With the prepared geometry, the join takes 8 seconds, five times faster. The larger the size difference between your tables, the larger the speed-up you see will be.

The functions effected by the PreparedGeometry upgrade are ST_Intersects(), ST_Contains(), ST_Covers() and ST_ContainsProperly().

To try out the new functionality, you'll need to check out and compile the GEOS SVN trunk (http://svn.osgeo.org/geos/trunk) which will become GEOS 3.1.0 in a little while, and the PostGIS 1.3 SVN branch (http://svn.refractions.net/postgis/branches/1.3), which will become PostGIS 1.3.4 shortly. First compile and install GEOS, then PostGIS, since PostGIS checks the GEOS version during the compile stage to determine whether to activate the functionality.

Major thank you to Zonar Systems for funding the initial work and then stepping up a second time to fund the clean-up and roll-out to production-ready status. Why did they do it? They run a major fleet tracking and data analysis system on PostGIS, and they need lots of speed to handle the huge data volumes generated by their real-time tracking devices.
 

1 comment:

Mateusz Loskot said...

Paul,

This is a great news! I'm glad to hear you have managed to solve the leak problem (I feel less guilty about my run away ;-)).

Looks we are getting close to release GEOS 3.1.0.

Great job!

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels

bc (37) it (29) postgis (19) icm (11) video (11) enterprise IT (10) sprint (9) open source (8) osgeo (8) cio (6) foippa (6) gis (6) management (6) spatial it (6) enterprise (5) foi (5) foss4g (5) mapserver (4) outsourcing (4) politics (4) bcesis (3) oracle (3) COTS (2) architecture (2) boundless (2) esri (2) idm (2) natural resources (2) ogc (2) open data (2) opengeo (2) openstudent (2) postgresql (2) rant (2) technology (2) vendor (2) web (2) 1.4.0 (1) HR (1) access to information (1) accounting (1) agile (1) aspen (1) benchmark (1) buffer (1) build vs buy (1) business (1) business process (1) cathedral (1) cloud (1) code (1) common sense (1) consulting (1) contracting (1) core review (1) crm (1) crockofshit (1) custom (1) data warehouse (1) deloitte (1) design (1) digital (1) email (1) essentials (1) evil (1) exadata (1) fcuk (1) fgdb (1) fme (1) foocamp (1) foss4g2007 (1) ftp (1) gds (1) geocortex (1) geometry (1) geoserver (1) google (1) google earth (1) government (1) grass (1) hp (1) iaas (1) icio (1) industry (1) innovation (1) integrated case management (1) introversion (1) iso (1) isss (1) isvalid (1) javascript (1) jts (1) lawyers (1) mapping (1) mcfd (1) microsoft (1) mysql (1) new it (1) nosql (1) opengis (1) openlayers (1) oss (1) paas (1) pirates (1) policy (1) portal (1) proprietary software (1) qgis (1) rdbms (1) recursion (1) redistribution (1) regression (1) rfc (1) right to information (1) saas (1) salesforce (1) sardonic (1) seibel (1) sermon (1) siebel (1) snark (1) spatial (1) standards (1) svr (1) taxi (1) tempest (1) texas (1) tired (1) transit (1) twitter (1) uber (1) udig (1) uk (1) uk gds (1) verbal culture (1) victoria (1) waterfall (1) wfs (1) where (1) with recursive (1) wkb (1)