Nerds Redux

I had a chance to re-present my FOSS4G 2009 keynote talk at the Rendezvous OSGeo a Quebec last week, and thanks to the good work of the FOSSLC team, there’s now a pretty clean online video of it.

PostGIS @ FOSS4G 2010

One of the things that tickled me about the presentations selected for FOSS4G 2010 was the number of talks in the list that specifically mention PostGIS:

  • Beyond PostGIS - New developments in Open Source Spatial Databases
  • Introducing PostGIS WKT Raster: Seamless raster/vector operations in a spatial database
  • Introduction of flood evacuation route search system?using QGIS,PostGIS,GRASS and PgRouting
  • Moving from Oracle/ArcGIS to PostGresql/PostGIS
  • PostGIS meets the third dimension
  • PostGIS WKT Raster. An Open Source alternative to Oracle GeoRaster
  • Running long and complex processes with PostGIS
  • The State of PostGIS
  • Tips for the PostGIS Power User

The last two are mine! And one is about not using PostGIS. But still, some interesting talks on the use and future of my favourite spatial database.

PgSQL on EC2

The theory behind putting a PostgreSQL (and PostGIS) instance on an Amazon EC2 instance with an Elastic Block Store (EBS) file system underneath is pretty straightforward, even for big databases. But when you want those databases to show the kind of properties we have come to expect from our systems, like durability, throughput, and reliability, things get much harder.

This thread on pgsql-general was very illuminating to me. Among the tidbits:

Let’s be clear here, physical I/O is at times terrible. :)

There’s no way we could run this database on a single EBS volume.

We had to fail over to one of our spares twice in the last 1.5 years. Not fun. Both times were due to instance failure.

Basically the assumptions of AWS architecture (virtual instances will be less reliable than real world computers, but that doesn’t matter because getting a new one is really easy) don’t map well with the requirements of running a classic production database.

There are probably some engineering solutions around for this (GlusterFS, for example, but the core PgSQL would need some serious work and end up looking a lot more like OracleRAC than the currently single-machine set-up.

More Stories from the Future of Computing

From the Jobs keynote of yesterday, a slide with a quote from Theo Gray of Wolfram, regarding the popular “Elements” iPad application:

I earned more on sales of The Elements for iPad in the first day than from the past 5 years of Google ads on periodictable.com.

Quoth Jobs, “That’s what I like to hear from you guys.” Audience whoops.

Right now the walled garden is kicking the jungle’s ass, but for how long? It’s incredibly interesting, that for the moment the old school revenue model of application sales is actually besting the new school free-with-strings (ads) model that we were told was the Future. Perhaps once HTML5 application quality gets up to the level of fit and finish that the current crop of native apps is providing we will flip back again.

I think, for example, of a stock market application. Would people pay a buck for a really excellent application that “just works” in a clean and uncluttered way for displaying current information, research, blah blah blah, instead of just going to Yahoo! Finance? The information is available for free (just like the periodic table!) but a really excellent encapsulation of that information might be compelling enough to pay for. Walled garden starts to fall apart where it interfaces with the jungle… once the application has to link out to things like company reports, and other non-structure pieces in the raw internet, it re-gains the clunkiness of the old browser experience. So why not start with the browser?

For geo, I think that sites like GeoCommons, which have applied a baseline level of structure to a wide swath of data, are fertile grounds for the “app treatment”. An application that provides superior interactive access to their data archives would be an alternative monetization path for leveraging their growing holdings of structured GIS data.

Interesting times!

Finding Corrupt PostgreSQL Data Files

While PostgreSQL itself will never create corrupt data files, that doesn’t stop other processes or hardware failures for corrupting the files underneath the database, which can cause database crashes. Josh Williams of End Point provides a super rundown of how to track and repair a file corruption.