"What if you're looking for a coffee shop nearby..."

If I hear this LBS “use case” one more time, I think my ears are going to start to bleed.

GeoWeb: Day 1/2

I’ve really enjoyed my GeoWeb experience. The official program has been mediocre (I say that and, hey, I’m on it) but the networking has been sublime. Ron Lake has managed to assemble a great cross section of movers and shakers in the next wave of geospatial in one place, so GeoWeb is the place to schmooze. Of course, being in one of the most lovely cities in North America is an added bonus, so the settings for schmoozing have been great too: the Wosk Center and last night a boat cruise during the summer Vancouver fireworks festival held on the harbour.

Schmoozing highlights: Ed Katibah, Microsoft SQL Server Spatial eminence (apparently, he likes chocolate) and repository of the complete history of all spatial databases, ever; blogging celebritarian James Fee, who is far less opinionated in person than one might expect; Alex Miller, head of ESRI Canada, auto racer, polymath, business tycoon; and Mike Springer who, apparently, was a professional fire-eater in Vegas a mere 10 years ago. He has since written Sketchup (that piddling little thing) and now heads Google’s whole 3D program but somehow the whole fire-eating thing keeps distracting me. Fire eating!

GeoWeb: Day 0

I’m off to visit the GeoSpider at the GeoWeb this week!

Today I started with an (unrelated) trip to the dentist (ultra-sonic tooth cleaning! ow! but fast!), and then hopped the ferry to Vancouver. Lovely day for traveling, sun glinting off the waves, etc, etc, then the bus ride from the ferry terminal into downtown Vancouver. Because of construction of the new transit line for the upcoming 2010 Olympics, the bus is running up Knight Street instead of Cambie Street, which affords an opportunity to view four decades of variations on the “Vancouver Special “.

The east side of Vancouver was built up in a bit of a hurry in the post-war years and largely filled with fairly small affordable bungalows (not so affordable anymore).

As time went on, these houses seemed a bit small for modern tastes, so a wave of replacements began, and many took the form of the Vancouver Special. The classic “special” has a particular form and floor plan, but in general the special is a two story box that maximizes the amount of floor space that can be extracted from the lot without contravening the building code.

The classic special looks like the picture above, but the special lives on in various versions, including a spiffy new 21st century version, with bay windows and greek temple door treatments, but the same basic two-story, maximum lot fill ethos.

Anyhow…

In the evening, I attended the OSGeo British Columbia meeting, ably hosted by Max and Martin at Sierra Systems (and including a stunning view of the harbour from their 25th story board room, I might add). Topics were eclectic: community course development at the Commonwealth of Learning; and a review of web application security concerns, and some geospatial security issues, from Martin Kyle. For the next meeting Martin has promised a working exploit of a buffer overflow weakness in Mapserver versions < 4.10.3. (If you are running Mapserver < 4.10.3 … upgrade!)

Counting Squares

One of the last projects I had a substantial hand in formulating and designing while at Refractions was a project for providing provincial-level environmental summaries, using the best possible high resolution data. The goal is to be able to answer questions like:

  • What is volume of old-growth pine in BC? By timber supply area? In caribou habitat?
  • What young forest areas on on south facing slopes of less than 10%, within 200 meters of water?
  • What is the volume of fir in areas affected by mountain pine beetle?
  • How much bear habitat is more than 5km from a road but not in an existing protected area? Where is it?

This is all standard GIS stuff, but we wanted to make answering these questions the matter of a few mouse gestures, with no data preparation required, so that a suitably motivated environmental scientist or forester could figure out how to do the analysis with almost no training.

Getting there involves solving two problems: what kind of engine can generate fairly complicated summaries based on arbitrary summary areas, and; how do you make that engine maximally usable with minimal interface complexity.

The solution to the engine was double-barreled.

First, to enable arbitrary summary areas, move from vector analysis units to a province-wide raster grid. For simplicity, we chose one hectare (100m x 100m), which means about 90M cells for all the non-ocean area in the jurisdiction. Second, to enable a roll-up engine on those cells, put all the information into a database, in our case PostgreSQL. Data entering the system is pre-processed, rasterized onto the hectare grid, and then saved in a master table that has one row for each hectare. At this point, each hectare in the province has over 100 variables associated with it in the system.

An example of the 1-hectare grid

To provide a usable interface on the engine, we took the best of breed everywhere we could: Google Web Toolkit as the overall UI framework; OpenLayers as a mapping component; server-side Java and Tomcat for all the application logic. The summary concept was very similar to OLAP query building, so we stole the ideas for the working of that tab from the SQL Server OLAP query interface.

The final result is Hectares BC, which is one of the cooler things I have been involved in, going from a coffee shop “wouldn’t this be useful” discussion to a prototype, to a funding proposal, to the completed pilot in about 24 months.

From 0 to 65 Million in 2 Hours

I’m doing some performance benchmarking for a client this week, so getting a big, real test database is a priority. The USGS TIGER data is one of the largest uniform data sets around, so I’ve started with that.

I just loaded all the edges, 64,830,691 of them, and it took just under 2 hours! Fortunately, the 2007 data comes in shape files, and the schemas are identical for each file, so the load script is as simple as this controller:

find . -name "fe*_edges.zip" -exec ./append_edges.sh {} ';'

And this runner (append_edges.sh):

unzip $1
shp2pgsql -W WINDOWS-1252 -D -a -s 4269 \`basename $1 .zip\`.shp fe_edges | psql tiger
rm fe*edges.*

Note the use of the -W parameter, to ensure that the high-bit “charàctérs “ are handled correctly, and the -a parameter, to append the file contents to the table.