Tuesday, July 15, 2008

Counting Squares

One of the last projects I had a substantial hand in formulating and designing while at Refractions was a project for providing provincial-level environmental summaries, using the best possible high resolution data. The goal is to be able to answer questions like:
  • What is volume of old-growth pine in BC? By timber supply area? In caribou habitat?
  • What young forest areas on on south facing slopes of less than 10%, within 200 meters of water?
  • What is the volume of fir in areas affected by mountain pine beetle?
  • How much bear habitat is more than 5km from a road but not in an existing protected area? Where is it?

This is all standard GIS stuff, but we wanted to make answering these questions the matter of a few mouse gestures, with no data preparation required, so that a suitably motivated environmental scientist or forester could figure out how to do the analysis with almost no training.

Getting there involves solving two problems: what kind of engine can generate fairly complicated summaries based on arbitrary summary areas, and; how do you make that engine maximally usable with minimal interface complexity.

The solution to the engine was double-barreled.

First, to enable arbitrary summary areas, move from vector analysis units to a province-wide raster grid. For simplicity, we chose one hectare (100m x 100m), which means about 90M cells for all the non-ocean area in the jurisdiction. Second, to enable a roll-up engine on those cells, put all the information into a database, in our case PostgreSQL. Data entering the system is pre-processed, rasterized onto the hectare grid, and then saved in a master table that has one row for each hectare. At this point, each hectare in the province has over 100 variables associated with it in the system.

An example of the 1-hectare grid

To provide a usable interface on the engine, we took the best of breed everywhere we could: Google Web Toolkit as the overall UI framework; OpenLayers as a mapping component; server-side Java and Tomcat for all the application logic. The summary concept was very similar to OLAP query building, so we stole the ideas for the working of that tab from the SQL Server OLAP query interface.

The final result is Hectares BC, which is one of the cooler things I have been involved in, going from a coffee shop "wouldn't this be useful" discussion to a prototype, to a funding proposal, to the completed pilot in about 24 months.
 

9 comments:

Brendan Hemens said...

Very, very cool. Threatening (as someone who's been paid to answer these sorts of queries many, many times), but so...sane.

I find the 'Software License' information on the wiki a little confusing...does it pertain to the project itself? Who is 'Canada' in the license sense? I ask as a government employee in another province. I really like how the wiki opens up the project development publicly.

Regina Obe said...

Didn't you once say raster in the database was a stupid idea. Oh was that meant to be qualified with a "It is stupid when everyone else does it."

Chad said...

Sheesh... I was looking into doing this in OpenLayers a year ago.. but me and another guy got screwed on the deal and quit working on it for them.. I still have some of the OL code laying about.. too much work to just dump it.

dylanb said...

Nice job. The interface is extremely intuitive, with enough query-able layers to make this app very general. Only one suggestion: instead of "Soil Development" consider "Soil Taxonomy".

Paul Ramsey said...

@regina, there's no rasters in the database, there's just a 99M record table that happens to correspond to a 1Ha gridding of the province. I could put little square polygons next to it and say it's vectors, if I wanted :) In fact, one way of proving it's not a real raster-in-database implementation is to note that we aren't storing a square extent... the grid necessary to inscribe BC and with 1Ha pixels has 220M squares, and we just store the 90M that are of interest to us, the ones in the province.

It's OLAP, a radically de-normalized model of the data, which makes summary querying much more convenient.

Joe said...

Paul,

I'm really impressed by this.

Especially impressed by the dynamic raster generation/colouring. Is the process for rendering the rasters (roughly):

1) receive query from browser
2) perform query on db and paint to temporary raster
3) use temporary mapserver file to render tiles
4) return id or path to tiles to browser for adding as an openlayers layer.

If not, is the source available somewhere to look at this process?

Paul Ramsey said...

@joe, the overlay colors are written directly to an image in the server-side code, mapserver isn't used. The incoming tile requests reference a query id, which corresponds to the query you are generating in the UI, that query is run, the db results are then used to paint the appropriate pixels in the output tile.

License is open source, but I don't think the SVN is public right now. If you request the source, they'll either open up the SVN or ship you a tarball.

ubernatural said...

I love seeing an app that's not "half-done" or just a proof of concept. There are so many cool GIS (and web) tools but so few apps that actually make brilliant use of them. Well done Paul (and Refractions!)

etdube said...

Very cool app. I especially like the query builder interface.

Have you thought of the possibility of using a real OLAP engine (such as Mondrian) in a web mapping app? I invite you to take a look at the Google Summer of Code project I'm working on. Also see my "SoC report: Geo-BI dashboards" posts on the OSGeo-SoC mailing list.

Raster-based spatial OLAP cubes are still a research issue, one of my colleagues here did her masters thesis on this subject. No proof-of-concept of this exists yet, though. It may be an area where rasters in the DB could actually make sense...

Etienne

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels

bc (31) it (26) postgis (17) icm (10) sprint (9) enterprise IT (8) open source (8) osgeo (8) video (8) management (6) cio (5) enterprise (5) foippa (5) gis (5) spatial it (5) foi (4) mapserver (4) outsourcing (4) bcesis (3) foss4g (3) oracle (3) politics (3) architecture (2) boundless (2) esri (2) idm (2) natural resources (2) ogc (2) open data (2) opengeo (2) openstudent (2) postgresql (2) rant (2) technology (2) vendor (2) web (2) 1.4.0 (1) COTS (1) HR (1) access to information (1) accounting (1) agile (1) aspen (1) benchmark (1) buffer (1) build vs buy (1) business (1) business process (1) cathedral (1) cloud (1) code (1) common sense (1) consulting (1) contracting (1) core review (1) crm (1) custom (1) data warehouse (1) deloitte (1) design (1) digital (1) email (1) essentials (1) evil (1) exadata (1) fcuk (1) fgdb (1) fme (1) foocamp (1) foss4g2007 (1) ftp (1) gds (1) geocortex (1) geometry (1) geoserver (1) google (1) google earth (1) government (1) grass (1) hp (1) iaas (1) icio (1) industry (1) innovation (1) integrated case management (1) introversion (1) iso (1) isss (1) isvalid (1) javascript (1) jts (1) lawyers (1) mapping (1) mcfd (1) microsoft (1) mysql (1) new it (1) nosql (1) opengis (1) openlayers (1) oss (1) paas (1) pirates (1) policy (1) portal (1) proprietary software (1) qgis (1) rdbms (1) recursion (1) regression (1) rfc (1) right to information (1) saas (1) salesforce (1) sardonic (1) seibel (1) sermon (1) siebel (1) snark (1) spatial (1) standards (1) svr (1) tempest (1) texas (1) tired (1) transit (1) twitter (1) udig (1) uk (1) uk gds (1) verbal culture (1) victoria (1) waterfall (1) wfs (1) where (1) with recursive (1) wkb (1)