PostGIS Performance: Prepared Geometry

Spatial joins are a common use case in spatial databases, putting together two tables based on the spatial relationships of their geometry fields. For example:

SELECT census.income, houses.value 
FROM census JOIN houses 
ON (ST_Contains(census.geom, houses.geom))

The way this gets evaluated by the database looks something like this:

for each census polygon
  for each house point near the census geom
    run the st_contains test on that pair of geometries

Because the outer loop is driven by the census geometry, you will get repeated calls to the “contains” algorithm that have the same polygon each time. By recognizing this repetition, you can build a shortcut, that creates a smart, indexed polygon, and uses it over and over each time it is repeated in a function call.

The “smart, indexed polygon” is a “PreparedGeometry”, and the concept was implemented in JTS over a year ago. About six months ago, it was ported to GEOS (a C++ mirror of JTS), but the port still had some nagging memory leaks which made it unready for production use.

Last month, Zonar Systems, who funded the initial JTS algorithm work, asked me to bring the functionality the rest of the way out to PostGIS. We found a C++ expert who identified and removed the last GEOS leaks, and I cleaned the leaks out of the PostGIS side of the implementation.

The speed difference is impressive!

I have a test data set of two tables: one table of 80 large polygons, and another table of 8000 small polygons. Each large polygon contains about 100 small ones.

Without the prepared geometry, a spatial join using ST_Intersects takes about 40 seconds. With the prepared geometry, the join takes 8 seconds, five times faster. The larger the size difference between your tables, the larger the speed-up you see will be.

The functions effected by the PreparedGeometry upgrade are ST_Intersects(), ST_Contains(), ST_Covers() and ST_ContainsProperly().

To try out the new functionality, you’ll need to check out and compile the GEOS SVN trunk (http://svn.osgeo.org/geos/trunk) which will become GEOS 3.1.0 in a little while, and the PostGIS 1.3 SVN branch (http://svn.osgeo.org/postgis/branches/1.3), which will become PostGIS 1.3.4 shortly. First compile and install GEOS, then PostGIS, since PostGIS checks the GEOS version during the compile stage to determine whether to activate the functionality.

Major thank you to Zonar Systems for funding the initial work and then stepping up a second time to fund the clean-up and roll-out to production-ready status. Why did they do it? They run a major fleet tracking and data analysis system on PostGIS, and they need lots of speed to handle the huge data volumes generated by their real-time tracking devices.

Rotten Afternoon

Anybody want a set of wisdom teeth? I’ve got a couple that I won’t be using anymore…

Sol Katz Award

To my astonishment, I received the Sol Katz Award for 2008 today. For the record, here is the acceptance speech I gave via video to the FOSS4G closing plenary session:

This is a big honour for me, to be in the company of people like Frank Warmerdam, Steve Lime and Markus Neteler as a Sol Katz recipient.

Those guys built core pieces of open source software with their bare hands, from the ground up, and that alone marks them out as special, but they also helped build their communities, and that’s a big part of their contribution, too.

In my case, community building is almost my only contribution.

I have added some very small amounts of code to PostGIS and uDig over the years, but until only a few months ago my main contribution was community building, by finding the funding or staff time to develop the projects, providing some design guidelines, and by working on the mailing lists to help people with problems.

So I want to start out by thanking a few of the people who did much of the actual work on the projects I have been identified with over the years, the PostGIS spatial database and the uDig desktop application.

Dave Blasby, a brilliant programmer, who wrote the first versions of PostGIS when he was at Refractions, and who taught me by osmosis many of the technical fundamentals I exercise to this day.

Sandro Santilli, who was so impressive as a volunteer contributor to PostGIS that I hired him sight-unseen to maintain PostGIS, which he did from his home in Rome, for a number of years.

Mark Cave-Ayland, who is still involved in PostGIS, and the “go to” guy when the problems get really hard.

Jody Garnett and Jesse Eichar, who took the uDig project from a sketch in a funding proposal to a working application, and have continued to nurture and improve it up to this day.

Those are just a very few of the people who have contributed to making the PostGIS and uDig projects successful, there are so many more, and I thank them all. Thank you so much!

I hope, that my receiving this award will inspire other non-technical members of the open source community. Open source is collaborative in all kinds of ways, not only do we share code, but we share effort, and money. For a manager, contributing money or staff time to open source is often a karmic investment – the return is impossible to foresee, and yet, in my experience, there always seems to be a return in the end, you are repaid for your investment many times in many ways, most of which you don’t expect.

I can’t accept an award honoring my investments, in time and money, to open source, without also honoring two men who are largely unknown in the open source GIS community.

Graeme Leeming and Philip Kayal were my business partners for ten years at Refractions Research, during the time we developed PostGIS and uDig, and without their willingness to invest in my crazy schemes, we would never have achieved what we did.

Their willingness to join in my enthusiasms and get off the beaten path of consulting was critical to making the projects successful, and all of us took risks together to make the projects great. So thanks Graeme and Phil, and also all the folks at Refractions.

I hope you have all had a great FOSS4G, I am sorry I could not be with you in person this year, but I’m looking forward to raising a pint with you all in Sydney, Australia next year.

Schuyler and the World of Tomorrow

Direct from FOSS4G2008, Schuyler Erle on geodata availability.

Game Theory and Congress

How could the bailout fail to pass Congress, when it had been negotiated and promoted by the leadership of both parties? Easy, it’s the tragedy of the commons! If I may enter the head of the Representative from South Jesusland, Phil I. Buster:

“If I vote for this, my constituents who hate Wall Street (and that’s a lot of them) will hate me!”
“But if this fails, the economy could crumble, and they’ll hate me more!”
“But if vote against it, and it passes, I win both ways! Hate Wall Street? I voted against it! Economy does OK? Who’s going to remember or care how I voted? Economy crumbles? It was a bad plan anyways!”

Multiply by a few hundred Representatives with fingers in the wind and voila! No plan, Dow tanks 700.

No, this isn’t geospatial, but I find it really interesting! Like having my liver removed while fully conscious.