Is "Good Enough" Good Enough?

The gift that keeps on giving, for me, is performance work. Users love it, and it tickles my cerebral cortex in a way other work just cannot touch. Some of my first substantive patches to MapServer were performance patches, and I’ve also taken that tack into PostGIS.

One of the things that surprised me while investigating the MapServer QIX index structure was how little a badly built tree impacted real world performance. The QIX file, when built on point files, tends to build in far more depth than it needs, strictly speaking. The extra depth adds theoretical time to traverse the tree for searches and yet… even when I tested quite huge shape files the performance was fine.

Recently I’ve been looking again at internally indexing geometries for high performance testing of things like distance and intersections. We already have some of this for PostGIS, using the GEOS “PreparedGeometry” construction, but I’d like a native implementation, and something that doesn’t necessarily depend on a cached implementation.

There is already a form of internally indexed testing in PostGIS, in the point-in-polygon case, and while reading it over I was struck by the elegance of the underlying assumption: rather than pre-sort the geometry edges, and then build the tree, the implementation simply scans the point array from start to finish and builds parent nodes from each successive pair of nodes. The result is not a perfect tree, by any stretch of the imagination, but… it’s good enough. Because the edges are spatially auto-correlated, the parent nodes end up having good locality.

The beauty of this approach to index building is that since there is no sorting step and no re-balancing of the tree or any of that fancy stuff, you can actually build an “index” in O(n) time. A brute force intersection test is O(nm) time. But if you can build your indexes in O(n) time the cost of doing an intersection starts to get closer to O(n+m) time! (Note, I am not a computer scientist and the O() term for the actual tree-on-tree intersection test is beyond my powers.)

Now, since no sorting is being applied in the building of these “indexes” they could theoretically be terrible indexes. But since we’re indexing GIS data, and the edges have this wonderful autocorrelation, they are actually “good enough”, and obtainable in very little time.

The same tricks apply to building indexes for intersection and distance in geodetic space, which I predict will be in hot demand once people experience just how computationally expensive operations on the new PostGIS 1.5 geography type are!

You get more than what you pay for!

Entchev reflexively repeats one of the most foolish canards ever hurled against collaborative knowledge building (open source, open data, open knowledge)

You get what you pay for.

Was there ever a putdown more easily falsifiable?

You pay nothing for PostGIS. Is the value to you nothing? You pay nothing for Wikipedia. Is the value to you nothing? You pay nothing for Open Street Map, is the value to you… nothing? Clearly, you, and everyone, gets more than what they pay for.

Now, there are many arguments to be made and beers to be drunk over whether what you get is enough for all possible purposes of all possible people. Some think that the trend in open knowledge is so strong that even that goal is within reach. Me, not so much.

But clearly, everyone has access to these resources, for free, and what they get in return is more than what they paid.

The First Rule of Fight Club...

Who knew that ESRI was so hip? From James Fee’s blog:

Note: If you post any specific 9.4 Beta information (such as quoting forums posts on the Beta forums), expect ESRI to personally contact you. They appear to be monitoring this blog post. You’ve been warned.

Palanterra X3

James Fee has a nice rant on mapping interfaces today, but the part the piqued my interest was embedded in the screen shot of the National Map. It said “The National Map Viewer BETA: Powered by Palanterra X3”

Interesting! Who or what is this Palanterra X3 who have managed to get a top-of-the-fold “Powered by” on a major USGS site? Oddly, the Google is unusually terse about this topic. Looks like it’s an NGA project, and yet it’s “Palanterra (TM)”. Intra-governmental entrepreneurship at its finest, a carefully imagined brand name that appears only in meeting agendas and powerpoints.

FOSS4G hearts SoTM

A quiet bit of byplay is going on in the world of extreme geogeekery, as the State of the Map conference, the annual gathering of the OpenStreetMap community, decides where to host their 2010 event. The twist for this year is that an intrepid member of their spanish community has suggested hosting the event in Barcelona, to align with the FOSS4G 2010 event (either right before or right after). The OSM community decides on their site in early December – I’ll add my non-binding “ooooooooh, I hopehopehopehopehope they choose Barcelona!”