Buffering Jujutsu

Update: Martin Davis tried out a variant of simplification, and produced results even faster than Michaël’s.

Inquiring about some slow cases in the JTS buffer algorithm, Michaël Michaud asked:

I have no idea why a large buffer should take longer than a small one, and I don’t know if there is a better way to compute a buffer.

Martin Davis and I dragged out the usual hacks to make large buffers faster, simplifying the input geometry, or buffering outwards in stages. Said Martin:

The reason for this is that as the buffer size gets bigger, the “raw offset curve” that is initially created prior to extracting the buffer outline gets more and more complex. Concave angles in the input geometry result in perpendicular “closing lines” being created, which usually intersect other offset lines. The longer the perpendicular line, the more other lines it intersects, and hence the more noding and topology analysis has to be performed.

A couple days later, Michaël comes back with this:

I made some further tests which show there is place for improvements with the excellent stuff you recently added.

I created a simple linestring: length = 25000m, number of segments = 1000

Trying to create a buffer at a distance of 10000 m ended with a heapspace exception (took more than 256 Mb!) after more than 4 minutes of intensive computation !

The other way is: decompose into segments (less than 1 second); create individual buffers (openjump says 8 seconds but I think it includes redrawing); cascaded union of resulting buffer (6 seconds).

Holy algorithmic cow, Batman! Creating and merging 1000 wee buffers is orders of magnitude faster than computing one large buffer, if you merge them in the right order (by using the cascaded union). Huge kudos to Michaël for attempting such a counter-intuitive processing path and finding the road to nirvana.

"What if you're looking for a coffee shop nearby..."

If I hear this LBS “use case” one more time, I think my ears are going to start to bleed.

GeoWeb: Day 1/2

I’ve really enjoyed my GeoWeb experience. The official program has been mediocre (I say that and, hey, I’m on it) but the networking has been sublime. Ron Lake has managed to assemble a great cross section of movers and shakers in the next wave of geospatial in one place, so GeoWeb is the place to schmooze. Of course, being in one of the most lovely cities in North America is an added bonus, so the settings for schmoozing have been great too: the Wosk Center and last night a boat cruise during the summer Vancouver fireworks festival held on the harbour.

Schmoozing highlights: Ed Katibah, Microsoft SQL Server Spatial eminence (apparently, he likes chocolate) and repository of the complete history of all spatial databases, ever; blogging celebritarian James Fee, who is far less opinionated in person than one might expect; Alex Miller, head of ESRI Canada, auto racer, polymath, business tycoon; and Mike Springer who, apparently, was a professional fire-eater in Vegas a mere 10 years ago. He has since written Sketchup (that piddling little thing) and now heads Google’s whole 3D program but somehow the whole fire-eating thing keeps distracting me. Fire eating!

GeoWeb: Day 0

I’m off to visit the GeoSpider at the GeoWeb this week!

Today I started with an (unrelated) trip to the dentist (ultra-sonic tooth cleaning! ow! but fast!), and then hopped the ferry to Vancouver. Lovely day for traveling, sun glinting off the waves, etc, etc, then the bus ride from the ferry terminal into downtown Vancouver. Because of construction of the new transit line for the upcoming 2010 Olympics, the bus is running up Knight Street instead of Cambie Street, which affords an opportunity to view four decades of variations on the “Vancouver Special “.

The east side of Vancouver was built up in a bit of a hurry in the post-war years and largely filled with fairly small affordable bungalows (not so affordable anymore).

As time went on, these houses seemed a bit small for modern tastes, so a wave of replacements began, and many took the form of the Vancouver Special. The classic “special” has a particular form and floor plan, but in general the special is a two story box that maximizes the amount of floor space that can be extracted from the lot without contravening the building code.

The classic special looks like the picture above, but the special lives on in various versions, including a spiffy new 21st century version, with bay windows and greek temple door treatments, but the same basic two-story, maximum lot fill ethos.


In the evening, I attended the OSGeo British Columbia meeting, ably hosted by Max and Martin at Sierra Systems (and including a stunning view of the harbour from their 25th story board room, I might add). Topics were eclectic: community course development at the Commonwealth of Learning; and a review of web application security concerns, and some geospatial security issues, from Martin Kyle. For the next meeting Martin has promised a working exploit of a buffer overflow weakness in Mapserver versions < 4.10.3. (If you are running Mapserver < 4.10.3 … upgrade!)

Counting Squares

One of the last projects I had a substantial hand in formulating and designing while at Refractions was a project for providing provincial-level environmental summaries, using the best possible high resolution data. The goal is to be able to answer questions like:

  • What is volume of old-growth pine in BC? By timber supply area? In caribou habitat?
  • What young forest areas on on south facing slopes of less than 10%, within 200 meters of water?
  • What is the volume of fir in areas affected by mountain pine beetle?
  • How much bear habitat is more than 5km from a road but not in an existing protected area? Where is it?

This is all standard GIS stuff, but we wanted to make answering these questions the matter of a few mouse gestures, with no data preparation required, so that a suitably motivated environmental scientist or forester could figure out how to do the analysis with almost no training.

Getting there involves solving two problems: what kind of engine can generate fairly complicated summaries based on arbitrary summary areas, and; how do you make that engine maximally usable with minimal interface complexity.

The solution to the engine was double-barreled.

First, to enable arbitrary summary areas, move from vector analysis units to a province-wide raster grid. For simplicity, we chose one hectare (100m x 100m), which means about 90M cells for all the non-ocean area in the jurisdiction. Second, to enable a roll-up engine on those cells, put all the information into a database, in our case PostgreSQL. Data entering the system is pre-processed, rasterized onto the hectare grid, and then saved in a master table that has one row for each hectare. At this point, each hectare in the province has over 100 variables associated with it in the system.

An example of the 1-hectare grid

To provide a usable interface on the engine, we took the best of breed everywhere we could: Google Web Toolkit as the overall UI framework; OpenLayers as a mapping component; server-side Java and Tomcat for all the application logic. The summary concept was very similar to OLAP query building, so we stole the ideas for the working of that tab from the SQL Server OLAP query interface.

The final result is Hectares BC, which is one of the cooler things I have been involved in, going from a coffee shop “wouldn’t this be useful” discussion to a prototype, to a funding proposal, to the completed pilot in about 24 months.