Schuyler and the World of Tomorrow

Direct from FOSS4G2008, Schuyler Erle on geodata availability.

Game Theory and Congress

How could the bailout fail to pass Congress, when it had been negotiated and promoted by the leadership of both parties? Easy, it’s the tragedy of the commons! If I may enter the head of the Representative from South Jesusland, Phil I. Buster:

“If I vote for this, my constituents who hate Wall Street (and that’s a lot of them) will hate me!”
“But if this fails, the economy could crumble, and they’ll hate me more!”
“But if vote against it, and it passes, I win both ways! Hate Wall Street? I voted against it! Economy does OK? Who’s going to remember or care how I voted? Economy crumbles? It was a bad plan anyways!”

Multiply by a few hundred Representatives with fingers in the wind and voila! No plan, Dow tanks 700.

No, this isn’t geospatial, but I find it really interesting! Like having my liver removed while fully conscious.

Crisis Averted, Situation Normal (Not!)

The news that the markets have roared back today, erasing many of the record losses of Monday, should make me feel better, but the stock markets are a sideshow – the real action is in the credit markets, and the news there still isn’t good.

Want a quick feel for how bad the credit crisis is? Compare how much banks will accept in interest on Treasury bills (basically a risk-free investment) to what they expect to receive in interest on loans to other banks (there is a non-zero risk that the bank might default, say, declare bankruptcy). In normal times, banks don’t go out of business (very often) so the difference is very small. In exceptional times (like now) nobody knows who is going to be in business tomorrow, and the difference is very large.

Right now, the difference (the “TED spread”) is about 3%, or 10 times the prevailing rate before the crisis began. It is as high as it has ever been, twice as high as it was after the collapse of Bear/Stearns this spring.

Not feeling a lot of comfort right now.

Point-in-Polygon Shortcuts

The code for spatial predicates in PostGIS is largely dependent on the GEOS topology library, because doing predicate calculations in generality is hard, and GEOS already exists. However, moving geometry from PostGIS into GEOS format incurs a cost. And not all predicate algorithms are hard. Point-in-polygon tests, for example, are relatively easy.

So, for performance reasons, a couple years ago, Mark Leslie (at Refractions at the time) implemented a point-in-polygon test directly in the PostGIS library. Whenever ST_Contains(), ST_Intersects(), etc, are called, the geometries are first checked to see if they are points and polygons, and if they are, GEOS is avoided and the calculation is done in PostGIS.

But, why stop at one shortcut?

What if you are testing hundreds (or thousands) of points against one, or a small number, of polygons? Why iterate through all the segments of the polygon for every point, to carry out the test? By indexing the segments of the polygon, you can reduce the computational effort of doing a point-in-polygon test from O(NumberOfPolygonEdges) to O(log(NumberOfPolygonEdges)). However, for the index to be effective, you have to re-use it for each new point, not re-build it for every polygon/point pair. That means it has to be cached between function calls.

A bit over a year ago, Mark implemented a caching version of the point-in-polygon shortcut, with an indexing algorithm from Martin Davis, and that shortcut currently resides in the 1.3 release series. However, it has two drawbacks.

  • First, it only works for POLYGON/POINT combinations, and most people working with polygons actually have MULTIPOLYGON/POINT (though their multi-polygons usually only have one member).
  • Second, it leaks a lot of memory during processing, so the postgres process size can get very large while the computation is running (PostgreSQL retrieves the memory at the end, so no permanent damage is done).

Last week I took a couple days to delve deeply into what this shortcut was doing, and made the following improvements.

  • Removed all the memory leaks, so the process size remains constant throughout the run.
  • Added support for MULTIPOLYGON types.
  • Improved the caching logic slightly, so that segment indexes are only built when a polygon has been seen two times in a row, otherwise using a standard non-indexed version of the point-in-polygon algorithm.

The point-in-polygon caching shortcut is extremely effective. Using the un-cached code, a spatial join of 8000 points to 80 polygons, where there are an average of 100 points per polygons, takes about 30 seconds on my workstation. With the caching segment indexes, the same join returns in 6 seconds.

The improved code is currently on trunk only, but I will back-port it into 1.3.X next week, so it will be available in the next points release.

Update: These changes have now been back-ported to 1.3 and will be generally available in 1.3.4 when it is released.

Flickr Tag: foss4g2008

I encourage all you lucky bastards folks heading to FOSS4G 2008 in Cape Town next week to upload your photos to Flickr using the tag foss4g2008. We got a great collection of photos from folks last year under the foss4g2007 tag.