Code Sprint 2010: New York City

It’s official! Following the success of the 2009 Toronto Code Sprint a second event will be taking place in New York City, this winter.

This code sprint will bring together the developers for the MapServer, PostGIS, GDAL, GRASS, OGR, LibLAS, QGIS, Proj4 and other projects that are part of the “C Tribe” of open source software.

Timing: Saturday, February 20 to Tuesday, February 23, 9am to 4pm.

Location: OpenGeo Event Room, 148 Lafayette St, New York, New York

More Information: Wiki Page, Mailing List

Sponsor this event! Last year, sponsorship dollars paid for venue and dinners for the sprinters, who gave up their weekends and more to invest a collective 736 man hours in making their open source project stronger. Thanks again to our 2009 sponsors: Rich Greenwood, OSGIS.nl, Coordinate Solutions, LizardTech, SJ Geophysics, qPublic.net, GDAL! If you want to sponsor this event, and raise your profile with the 2 dozen people who make your software sing, please contact me.

Update: Thanks to our first 2010 sponsors, Coordinate Solutions, LizardTech, and qPublic.net! We are still looking for sponsors for 2010!

Perfect

The New York Times political blog has instant commentary on Obama’s Afghanistan speech tonight, and the one-sentence teaser in the RSS feed speaks volumes:

In an address aimed at rank-and-file Americans, Mr. Obama did not call for sacrifices.

Is "Good Enough" Good Enough?

The gift that keeps on giving, for me, is performance work. Users love it, and it tickles my cerebral cortex in a way other work just cannot touch. Some of my first substantive patches to MapServer were performance patches, and I’ve also taken that tack into PostGIS.

One of the things that surprised me while investigating the MapServer QIX index structure was how little a badly built tree impacted real world performance. The QIX file, when built on point files, tends to build in far more depth than it needs, strictly speaking. The extra depth adds theoretical time to traverse the tree for searches and yet… even when I tested quite huge shape files the performance was fine.

Recently I’ve been looking again at internally indexing geometries for high performance testing of things like distance and intersections. We already have some of this for PostGIS, using the GEOS “PreparedGeometry” construction, but I’d like a native implementation, and something that doesn’t necessarily depend on a cached implementation.

There is already a form of internally indexed testing in PostGIS, in the point-in-polygon case, and while reading it over I was struck by the elegance of the underlying assumption: rather than pre-sort the geometry edges, and then build the tree, the implementation simply scans the point array from start to finish and builds parent nodes from each successive pair of nodes. The result is not a perfect tree, by any stretch of the imagination, but… it’s good enough. Because the edges are spatially auto-correlated, the parent nodes end up having good locality.

The beauty of this approach to index building is that since there is no sorting step and no re-balancing of the tree or any of that fancy stuff, you can actually build an “index” in O(n) time. A brute force intersection test is O(nm) time. But if you can build your indexes in O(n) time the cost of doing an intersection starts to get closer to O(n+m) time! (Note, I am not a computer scientist and the O() term for the actual tree-on-tree intersection test is beyond my powers.)

Now, since no sorting is being applied in the building of these “indexes” they could theoretically be terrible indexes. But since we’re indexing GIS data, and the edges have this wonderful autocorrelation, they are actually “good enough”, and obtainable in very little time.

The same tricks apply to building indexes for intersection and distance in geodetic space, which I predict will be in hot demand once people experience just how computationally expensive operations on the new PostGIS 1.5 geography type are!

You get more than what you pay for!

Entchev reflexively repeats one of the most foolish canards ever hurled against collaborative knowledge building (open source, open data, open knowledge)

You get what you pay for.

Was there ever a putdown more easily falsifiable?

You pay nothing for PostGIS. Is the value to you nothing? You pay nothing for Wikipedia. Is the value to you nothing? You pay nothing for Open Street Map, is the value to you… nothing? Clearly, you, and everyone, gets more than what they pay for.

Now, there are many arguments to be made and beers to be drunk over whether what you get is enough for all possible purposes of all possible people. Some think that the trend in open knowledge is so strong that even that goal is within reach. Me, not so much.

But clearly, everyone has access to these resources, for free, and what they get in return is more than what they paid.

The First Rule of Fight Club...

Who knew that ESRI was so hip? From James Fee’s blog:

Note: If you post any specific 9.4 Beta information (such as quoting forums posts on the Beta forums), expect ESRI to personally contact you. They appear to be monitoring this blog post. You’ve been warned.