Googlesoft Redux

Some might think that there is a subtext (Microsoft is Evil! Google is becoming like Microsoft! Ergo! Google is Evil!) underlying my previous post, so let me disabuse that now: Microsoft is not Evil. Follow the reasoning from there.

However, Microsoft is Big. And Microsoft has a Lot of Money. And Microsoft is looking for New Sources of Earnings. Because Microsoft makes all of its money in One Place (well, two). Sound familiar?

When an organization gets as successful as Microsoft or Google in one market, it has lots of cash and time on its hands to play around in other markets. Ever see a cat play with a mouse? It’s fun… for the cat.

What twigged me was the Google Latitude announcement (incidentally, thanks CBC, for the scary technology story last night). Hey, mobile social networking… just like Loopt!

How did it go down? Like this?

Google: Hm, we’ve got maps on Android and iPhone, we can get location, we should use this for social networking. Hmm, Loopt is already doing that.
Google: Hey Loopt! Wanna be bought?
Loopt: Sure, that’ll be $100M.
Google: Are you insane? In this market? Try $10M.
Loopt: But our investors are already into us for $10M…
Google: Sorry, thems the breaks. We can build it ourselves for less than $10M, if you know what I mean, wink, wink, nudge, nudge, say no more, say no more.
Loopt: sob

Or maybe like this?

Googler A: Let’s build a mobile social networking site!
Googler B: And then play some beach volleyball!
Googler A: Yeah!
Loopt: sob

Really, no matter how the story goes, it ends up with the notional value of Loopt shares falling by 50-80% on the day Google Latitude is announced.

And the investors who dropped their millions into Loopt? What are they going to think the next time someone comes with a great technology idea, maybe a little ahead of its time, but a chance to build out and be ready to rock when the pieces fall into place. Maybe they’ll think: yeah, this could grow into something huge, but as soon as the market matures enough to be valuable, Google is just going to steal it anyways. Pass.

Googlesoft

It’s been obvious for a while, but it took the release of “Google Latitude” to finally shock me into conscious awareness. Google is now the Microsoft of internet services. (Microsoft remains the Microsoft of desktop software. Apple hopes to be the Microsoft of smart phones.)

I’m not referring necessarily to full-on bring-in-the-DoJ monopoly here, but to a level of leadership in market consciousness, and the bottomless pockets, that allow for a form of monopolistic behavior. I remember reading articles in the late nineties about the death of innovation in desktop software, and one of the factors blamed was the “Microsoft effect” – investors stopped supporting innovative new desktop software out of fear of Microsoft. Microsoft would allow you as a third-party developer to have a neat little application, to write some shareware, but if you invented anything strategic of of potentially large market, Microsoft would make its own version, inferior at first, but would eventually crush you and take your market.

And now we have Google, doing the same thing. It’s been a while since Google brought out anything truly innovative, but they sure have shown themselves willing to copy the services of upstart companies and try to snatch their markets away. Sometimes they win (Google Docs) and sometimes they lose (Google Video, Orkut) but this kind of aggressive barging in will eventually dry up the investment ecosystem for web services.

PostGIS and the Public Sector

The EU’s Open Source Observatory and Repository reports on the ESLAP conference in Portugal. Looks like some big public agencies are moving to PostGIS as their spatial data store. We already knew about France’s Institut Géographique National, but to that add Portugal’s Instituto Geográphico Português.

Data is a Long Lived Investment

Technology is not.

Double-plus super-terrific hands-on-the-ceiling support to Sean Gorman’s take on the various self-serving NSDI proposals circulating the internet.

Someone posted this link to the story of the rescue of the Canada Land Inventory recently, and I really enjoyed it. Note that, a decade after the program wound down, the technology value had declined to zero. Less than zero, actually, since it was now an impediment to accessing the data. Meanwhile, the data, some of it 25 years old, had retained its value.

Here in my neighborhood, British Columbia has no unified parcel inventory / ownership inventory. The ownership database is separate from the parcel information (which is incomplete and out of date). Bits and pieces, hither and yon. Modernization talk always tends to focus on the technology, but the real project is a big, expensive, labor intensive slog through the data, to apply consistency rules to the ownership database, and complete the parcel data. Make sure all the work is done here in BC, and it’s a 100% stimulus that, like building a bridge, will create an asset with a multi-generational life-span.

(Speaking of multi-generational, the Patullo Bridge in Vancouver recently caught fire and a section burned to the ground. It’s a four-lane free-way bridge, how did this happen? Turns out, it’s really old, older than most people think, and one section was built of wood! In 1936. In the Depression. Call your representatives, tell them you don’t want tax cuts, you want bridges, you want basic scientific knowledge, you want railways, you want power lines, you want dams, things that will still be around when your grandchildren have grandchildren.)

(Much) Faster Unions in PostGIS 1.4

Originally posted at GeoSpeil.

I have had a very geeky week, working on bringing the “cascaded union” functionality to PostGIS.

By way of background, about a year ago, a PostGIS user brought a question up. He had about 30K polygons he was unioning and the process was taking hours. ArcMap could to it in 20 minutes, what was up? First of all, he had an unbelievably degenerate data set, which really did a great job of exposing the inefficiency of the PostGIS union aggregate. Second, the union aggregate really was inefficient.

This is what his data looked like, before and after union.

The old PostGIS ST_Union() aggregate just naively built the final result from the input table: union rows 1 and 2 together, then add row 3, then row 4, etc. As a result, each new row generally made the interim polygon more complex — more vertices, more parts. In contrast, the “cascaded union” approach first structures the data set into an STR-Tree, then unions the tree from the bottom up. As a result, adjacent bits are merged together progressively, so each stage of the union does the minimum amount of work, and creates an interim result simpler than the input components.

Implementing this new functionality in PostGIS required a few steps: first, the algorithm had to be ported from JTS in Java to the GEOS C++ computational geometry library; second, the C++ algorithm in GEOS had to be exposed in the public GEOS C API; third, PostGIS functions to call the new GEOS function had to be added.

The difference on the test data set from our user was stark. My first cut brought the execution time in PostGIS from 3.5 hours to 4.5 minutes for the sample data set. That was excellent! But, we knew that the JTS implementation could carry out the same union on the same data in a matter of seconds. Where was the extra 4 minutes going in PostGIS? Some profiling turned up the answer.

Before you can run the cascaded union process, you need to aggregate all the data in memory, so that a tree can be built on it. The PostGIS aggregation was being done using ST_Accum() to build an array of geometry[], then handing that to the union operation. But the ST_Accum() aggregation was incredibly inefficient! Four minutes of overhead isn’t a bit deal when your union is taking hours, but now that it was taking seconds, the overhead was swamping the processing.

Running a profiler found the problem immediately. The ST_Accum() aggregate built the geometry[] array in memory, repeatedly memcpy()‘ing each interim array. So the array was being copied thousands of times. Fortunately, the upcoming version of PostgreSQL (8.4) had a new array_agg() function, which used a much more efficient approach to array building. I took that code and ported it into PostGIS, for use in all versions of PostgreSQL. That reduced the aggregation overhead to a few seconds.

Final result, the sample union now takes 26 seconds! A big improvement on the original 3.5 hour time.

Here’s a less contrived result, the 3141 counties in the United States. Using the old ST_Union(), the union takes 42 seconds. Using the new ST_Union() (coming in PostGIS 1.4.0) the union takes 3.7 seconds.