Googlesoft

It’s been obvious for a while, but it took the release of “Google Latitude” to finally shock me into conscious awareness. Google is now the Microsoft of internet services. (Microsoft remains the Microsoft of desktop software. Apple hopes to be the Microsoft of smart phones.)

I’m not referring necessarily to full-on bring-in-the-DoJ monopoly here, but to a level of leadership in market consciousness, and the bottomless pockets, that allow for a form of monopolistic behavior. I remember reading articles in the late nineties about the death of innovation in desktop software, and one of the factors blamed was the “Microsoft effect” – investors stopped supporting innovative new desktop software out of fear of Microsoft. Microsoft would allow you as a third-party developer to have a neat little application, to write some shareware, but if you invented anything strategic of of potentially large market, Microsoft would make its own version, inferior at first, but would eventually crush you and take your market.

And now we have Google, doing the same thing. It’s been a while since Google brought out anything truly innovative, but they sure have shown themselves willing to copy the services of upstart companies and try to snatch their markets away. Sometimes they win (Google Docs) and sometimes they lose (Google Video, Orkut) but this kind of aggressive barging in will eventually dry up the investment ecosystem for web services.

PostGIS and the Public Sector

The EU’s Open Source Observatory and Repository reports on the ESLAP conference in Portugal. Looks like some big public agencies are moving to PostGIS as their spatial data store. We already knew about France’s Institut Géographique National, but to that add Portugal’s Instituto Geográphico Português.

Data is a Long Lived Investment

Technology is not.

Double-plus super-terrific hands-on-the-ceiling support to Sean Gorman’s take on the various self-serving NSDI proposals circulating the internet.

Someone posted this link to the story of the rescue of the Canada Land Inventory recently, and I really enjoyed it. Note that, a decade after the program wound down, the technology value had declined to zero. Less than zero, actually, since it was now an impediment to accessing the data. Meanwhile, the data, some of it 25 years old, had retained its value.

Here in my neighborhood, British Columbia has no unified parcel inventory / ownership inventory. The ownership database is separate from the parcel information (which is incomplete and out of date). Bits and pieces, hither and yon. Modernization talk always tends to focus on the technology, but the real project is a big, expensive, labor intensive slog through the data, to apply consistency rules to the ownership database, and complete the parcel data. Make sure all the work is done here in BC, and it’s a 100% stimulus that, like building a bridge, will create an asset with a multi-generational life-span.

(Speaking of multi-generational, the Patullo Bridge in Vancouver recently caught fire and a section burned to the ground. It’s a four-lane free-way bridge, how did this happen? Turns out, it’s really old, older than most people think, and one section was built of wood! In 1936. In the Depression. Call your representatives, tell them you don’t want tax cuts, you want bridges, you want basic scientific knowledge, you want railways, you want power lines, you want dams, things that will still be around when your grandchildren have grandchildren.)

(Much) Faster Unions in PostGIS 1.4

Originally posted at GeoSpeil.

I have had a very geeky week, working on bringing the “cascaded union” functionality to PostGIS.

By way of background, about a year ago, a PostGIS user brought a question up. He had about 30K polygons he was unioning and the process was taking hours. ArcMap could to it in 20 minutes, what was up? First of all, he had an unbelievably degenerate data set, which really did a great job of exposing the inefficiency of the PostGIS union aggregate. Second, the union aggregate really was inefficient.

This is what his data looked like, before and after union.

The old PostGIS ST_Union() aggregate just naively built the final result from the input table: union rows 1 and 2 together, then add row 3, then row 4, etc. As a result, each new row generally made the interim polygon more complex — more vertices, more parts. In contrast, the “cascaded union” approach first structures the data set into an STR-Tree, then unions the tree from the bottom up. As a result, adjacent bits are merged together progressively, so each stage of the union does the minimum amount of work, and creates an interim result simpler than the input components.

Implementing this new functionality in PostGIS required a few steps: first, the algorithm had to be ported from JTS in Java to the GEOS C++ computational geometry library; second, the C++ algorithm in GEOS had to be exposed in the public GEOS C API; third, PostGIS functions to call the new GEOS function had to be added.

The difference on the test data set from our user was stark. My first cut brought the execution time in PostGIS from 3.5 hours to 4.5 minutes for the sample data set. That was excellent! But, we knew that the JTS implementation could carry out the same union on the same data in a matter of seconds. Where was the extra 4 minutes going in PostGIS? Some profiling turned up the answer.

Before you can run the cascaded union process, you need to aggregate all the data in memory, so that a tree can be built on it. The PostGIS aggregation was being done using ST_Accum() to build an array of geometry[], then handing that to the union operation. But the ST_Accum() aggregation was incredibly inefficient! Four minutes of overhead isn’t a bit deal when your union is taking hours, but now that it was taking seconds, the overhead was swamping the processing.

Running a profiler found the problem immediately. The ST_Accum() aggregate built the geometry[] array in memory, repeatedly memcpy()‘ing each interim array. So the array was being copied thousands of times. Fortunately, the upcoming version of PostgreSQL (8.4) had a new array_agg() function, which used a much more efficient approach to array building. I took that code and ported it into PostGIS, for use in all versions of PostgreSQL. That reduced the aggregation overhead to a few seconds.

Final result, the sample union now takes 26 seconds! A big improvement on the original 3.5 hour time.

Here’s a less contrived result, the 3141 counties in the United States. Using the old ST_Union(), the union takes 42 seconds. Using the new ST_Union() (coming in PostGIS 1.4.0) the union takes 3.7 seconds.

C Code Sprint, Toronto, March 7-10

As I mentioned earlier there is going to be a code sprint in Toronto from March 7-10, with a particular focus on C/C++ based open source geospatial software. The attendance list is now a veritable who’s who of Mapserver, PostGIS, GEOS, proj4 and GDAL developers. Of particular excitement to me, two members of the PostGIS team who I have not met – Mark Cave-Ayland and Regina Obe – are planning to attend.

Thanks to our event sponsors: Rich Greenwood, OSGIS.nl, Coordinate Solutions, LizardTech, and SJ Geophysics. If you or your organization would like to be an event sponsor too, please contact me directly!