Why SDIs Fail

My colleague Jody Garnett recently got to participate in a workshop on Spatial Data Infratructures (SDIs) at the United Nations, and he brought back a nice overview document written by a consultant to the UN. It lays out all the various national and sub-national SDI initiatives, and their strategies, and it all seems like very reasonable stuff.

Except that none of them are succeeding.

Some of them have been at it for over 10 years (like the NSDI work in the United States). Some of them have even been backed by some reasonable funding (like the CGDI in Canada). None of them really have public penetration. Not on the level of Google or Yahoo or Mapquest or MS Virtual Earth.

Is this success? It doesn’t feel like it. Even within the cognoscenti of GIS professionals the SDI initiatives have less intellectual traction than the consumer portals.

So perhaps this is failure. Why?

The Missing Incentive

SDIs make the virtuous and correct assertion that if everyone shared their data, nearest to the point of creation, using various technical tricks and standards, then distribution and integration cost for everyone would go down. We would all be winners.

It is a classic case of global accounting. Sum up everyone’s bottom line, and behold: in aggregate everyone is better off!

But not everyone is equally better off.

The operational folks, the guys actually building parcel fabrics and tenuring layers and physical infrastructure databases, are not doing it for fun, and they aren’t doing it to share. They are doing it to meet particular operational goals that are specific to themselves and themselves only.

Publishing their data to the rest of the world is a pure cost center for them. It is a small cost center, publishing the data is not hard, but it is still a cost. Publishing provides them no substantial business benefit. It provides other people with a huge benefit, hence the positive global bottom line, but for the most important people to the process, the data creators, it is just a pain in the ass.

I call this problem the “operational/analytical divide”, because it happens everywhere, from accounting registers to airline reservation systems. The needs of the people operationally working with the data do not align with the needs of the people driving up analysis to decision makers. Hence a million data marts bloom, and various other strategies to give the analysts the data they need without impeding the operational folks.

Note that the folks pushing SDI are almost exclusively coming from the analytical, decision support side of the equation. This is a recipe for frustration, particularly when trying to build something as loosely coupled as an SDI. So most SDI implementations still lack access to some of the most informative, up-to-date information, the operational information maintained and updated regularly by land managers.

This is not a recipe for relevance.

It Gets Worse

Often the local solution to the problem of operational disinterest is for the analytical side to take on the whole cost of integration and provide a single analytical environment for their jurisdiction.

Aha! Solution! These enlightened analysts will then take their integrated data and publish it out to the SDI and thus the world. Except the analysts suffer the same disease as the operational folks, just with larger borders.

In British Columbia, the analytical response to data integration problems was to build, over several years, a data warehouse which holds all the operational and inventory data, updated in real time. Cool stuff, and quite useful to them! But is this data available to the national SDI? Only a handful of layers.

Why? For the British Columbia analytical folks, providing data to the national SDI has little or no return-on-investment. So while they will acknowledge that it is a nice idea (it is a nice idea!), they will also acknowledge that it is not a high priority. What’s in it for them?

Users? What users?

On the technical side, even where data is available, the user experience has not been well served. The implementation of SDI technology thus far still seems “proof of concept”, getting as far as “does this idea work?”, but not as far as “can we make this work well?”

So, even where large quantities of data are published online (in the case of, for example the GeoBase LandSat data made available by the Canadian CGDI effort), the access is sufficiently slow to deter all but the most avid users. Specifications for response time (where they exist, and they often do not) are absurdly slow (perhaps they were written in the mid-1990s). Less than 5 seconds to return a map might be OK if I only want one map, but any interactive application will require me to request numerous maps in the course of exploring the data, and those 5 seconds really pile up.

Why was Google Maps a sensation, while Mapquest remained a yawn? Google optimized the part of the process most annoying to users, the interactive exploration part. Why did Google do this and not SDIs? One can only assume it was because building the system is the goal of SDI efforts (and not to be diminished, the system is very hard!) not thinking about the end user.

But (I can hear them yelling) the consumer space is not what SDIs are about! Then who are they about, because I cannot think of a market segment well served by SDI efforts to date. Not GIS professionals, they cannot abide the forest of web-mapping obstacles that have been erected in front of various data bases over the years. Not general public consumers, as noted above. Not decision makers, they can’t use the tools provided, and the professionals who serve them are still doing the same song-and-dance-and-secret-handshake number they were doing 10 years ago to achieve data access.

Attending a talk this year about the emergency response in British Columbia to bird flu, I was struck that despite all the state-of-the-art technology being used (ArcMap, SDE, LANs and wireless) the fundamental business architecture (madly collect big pile of data, madly print maps) is no different than if the talk were given 10 years ago. No SDI in sight, despite the near-constant pitch of SDI as a cure-all for integration in critical response settings.

Oh, And the Unpleasant Matter of the Bill

Of course, no discussion of the impotence of SDI efforts could fail to acknowledge the pivotal role of data cost recovery policies in blunting the sharing of information between organizations and levels of government.

The most valuable information for sharing and decision making is the most volatile, because direct close-to-source access assures timeliness and accuracy. But volatile data is also perceived (correctly) as having the most intrinsic value. And so the creators hold it close, and only share it in traditional ways (file shipping, protected download) using traditional business models (pay to play).

Did I Mention Metadata?

And you thought you could get out without being subjected to yet another discussion of metadata.

Metadata is the core of the SDI vision, since a distributed system of resoures requires a searchable registry of available resources in order to direct people to the information they need.

But, like basic sharing itself, metadata creation suffers from missing incentives, because metadata is all about other people (not important people, like me). I don’t need to write metadata about my data, I already know all about my data!. There are no awards for good meta-data writing, except karmic ones, and karma doesn’t pay the bills.

This may sound overly harsh, but a quick review of the contents of the public capabilities files of web services currently online should dispell most people’s notions of metadata quality.

Loosely Coupled Things Are Easy To Break

One of the ironies of the SDI-for-crisis-response sales pitch is that it holds up the classic centralized solution as having a “single point of failure” (boo! shame!). Unspoken is the fact that the SDI solution has N points of failure, where N is the number of nodes.

Our company had the fortune to have the opportunity to build a client portal that used SDI resources to provide an integrated viewing and data searching capability. It was “SDI come to life”, and a great learning experience. Though our code worked fine, our delivery was a nightmare.

  • The map application was too slow. Well, that wasn’t our code’s fault, one of the servers we referenced for base mapping was really slow. Eventually we found a faster one.
  • The search service(s) returned really terse results. Well, that was what the metadata server returned, we had no control over that. (Actually we did, we could obtain more verbose results but then the search took several minutes to return. Pick your poison.)
  • The search service was really slow. Again, ask a slow server a question, get a slow answer.
  • And finally the coup de grâce. After delivery, the application kept breaking. Or rather, the services it depended on kept breaking. Or changing their API just a little. But since we were the client-facing part of the whole system, it was us who got the service calls.

Reliability is not something any part of the SDI infrastructure takes overall responsibility for, so any component can drag down any of the others. In the end, it all gets delegated to the last link in the loosely coupled chain, the client software. But the architecture is supposed to make getting data easy, and instead it makes it really hard, because the client has to keep track of all the servers and their reliability.

Some SDI server providers make an effort to provide really high uptimes and great performance, but it only takes one crappy server to make the user experience terrible. So the whole system reliability problem gets delegated to the client. That makes for a lot of work on systems which are supposed to be relatively lightweight.

We’re Doomed! Doomed!

So, to sum up:

  • Data creators have no incentive to join SDIs;
  • Data analyzers have incentive, but once they have all the data they need, they’re done;
  • The user experience is pretty uniformly worse than what commercial outfits provide;
  • Data pricing policies are locking up much of the “good” data, even if people were interested in sharing;
  • Even the people who are sharing aren’t doing a very good job of it, since their metadata is garbage; and,
  • When put into operation, the whole system is unbelievably brittle, leading to yet more bad user experiences.

Is there no good news at all? Sure, and tomorrow I will write about that.

I Don't Hate Users

Really! I don’t! (Not much. Well. Not all the time.)

I start with something that sounds controversial, but really is just observationally factual: “users do not contribute anything to open source projects”. It sounds like I am putting them down (“darned users!”) but that is not what I am doing. It is just straightforward observation that if you aren’t adding code, documentation, bug reports, or user support to a project, if you are just downloading and using the software, you are not adding any value to the project. From the project point of view, you might as well not exist. If it weren’t for the entry in the server log, they couldn’t prove your existence.

I hope that the above is not actually controversial.

Which brings us to a different idea, “user focused”, why should I bother catering to people who are effectively invisible to the activities of the project? There is nice comment from “step” which includes this:

There are a lot of contributors to Firefox (myself included) who never would have become so had they not first been a user.

So, there is a valid “developer focused” reason to be “user focused”. Using is a gateway drug for developers. I find that very persuasive indeed. Hopefully non-controversial as well.

All this gets bound up and confuzzulated with my “firehose of money” comment in my last posting. That comment is controversial, and seems to be the core item riling folks up.

My belief (and I am sure there are plenty of counterexamples to prove me wrong, but then, I am not trying to postulate a law of nature here) is that large, complex, user-facing applications (like, say, office suites or web browsers or GIS desktops) are sufficiently weighty that incubating them, getting them from a “concept” level of quality to a “user focused” level of quality, requires a large initial financial commitment. Or a large initial amount of time (Ooop, you caught me! I’m hedging! Creating some wriggle room!). A firehouse of money, or a patient drip drip drip of a rivulet of time.

Asa Dotzler thinks that, with respect to the Firefox project (and unspecified others), I am full of it. I think that, in ignoring the six years of corporate funding (the proverbial “firehose”) for technology that underlies Firefox, he is gilding the lily. Firefox is great work, but attributing its success to the community effort alone is as unfair as attributing it to corporate money alone. Ooops, I did that. Sorry!

Shades of grey, always ahades of grey.

An open source GIS on every Desktop?

Steven Citron-Pousty has a good posting about why he cannot move his shop to an open source basis in the near term. He took a look at uDig (thanks!) but, unsurprisingly, finds that it ain’t ArcGIS quite yet. His prescription is not for the faint of heart.

If you want people to switch you need to make the transition as painless as possible. Firefox got people to switch to IE by

  • Making better software
  • Not making user learn a new UI for interacting with the web
  • importing all their IE favorites
  • THEN building in cool new features that keep people around

So, all we have to do is make something better than ArcGIS, but not so much better that it is not familiar to the existing user base, that works transparently with all their existing data and presumably their .mxd files too. And give it away for free.

“Just be like Firefox.” There are a couple of problems with this idea.

  • The first problem is the idea that garnering users is “the goal”. It is not. The misunderstanding is reasonable, because for proprietary companies it is the goal – more users implies more licensing dollars. For open source projects, more users just means somewhat more download bandwidth and slightly higher number of beginner questions on the mailing lists. What open source projects want to attract is not users – it is developers. Developers will make the project stronger, add features, fix bugs, do all the things that end users want, but cannot do for themselves.
  • The second problem is that Firefox is not a normal open source project. The Mozilla Foundation has a lot of employees, most of them working on development, and a deal with Google that nets them millions of dollars each year. They can afford to be end user focused, because they have a paid pool of developers already in house.

It seems like all of the “user” success stories in the open source world (Firefox, Open Office, some of the “desktop Linux” efforts) have at their core one common feature – a large and ongoing firehose of money. Absent the firehose, it is hard to aggregate enough continuous effort to create desktop applications that include both the number of features and quality of finish necessary to entice the otherwise unmotivated “user”.

uDig has enough features and stability to be useful right now to a narrow pool of developers creating custom applications with specific toolsets. Hopefully in the future it will have more features and stability, and the pool will be less narrow. But I predict it will still be dominated by developers.

And that’s fine with me.

If Steven adopted uDig and PostGIS for his shop, it would not do a thing for my bottom line. But if he built an application around it, he would either add a little functionality (which would help me with my clients) or maybe hire us to add a little functionality (which would help the bottom line directly).

Open source is not about users, it is about developers. It is only about users in so far as users become sufficiently engaged in the project that they either become developers themselves, or support developers through careful bug finding or documentation.

The correct models are not Firefox or Open Office (unless someone wants to point a money firehose at me… I won’t object) those projects are aberations. The correct models are Linux, Apache, Perl, PostgreSQL – not user-friendly, but still very useful.

Case Studies Considered Harmful?

Over the past month, I have been trying to compile a list of good case studies of organizations using PostGIS in their daily business. So far, it seems hard to get people out of their shells and say what they are doing – even when you promise to do all the work of writing up the story!

I know from things like the membership of the postgis-users mailing list that there are some big companies using PostGIS. Big names in the geospatial world use it for all kinds of production oriented tasks. But apparently they do not want their stories told.

This is not a problem unique to PostGIS. Other open source projects suffer from the same “shy user” syndrome. I read the postgresql-advocacy list and often see comments to the effect that “my client is a huge company, and they love the performance they are getting from PostgreSQL, but they do not want to be publicly named”.

What will it take to get big organizations to “out” themselves?

ArcSDE comes to PostgreSQL?

It has long been rumoured that ESRI might move their “database neutral” ArcSDE to the ultimate “neutral database”, PostgreSQL. I have heard versions of this idea since around 2003, but I never thought they would come to pass. So, mea culpa, all the people I told “it will never happen”… it has!

Yes, ESRI is currently in the process of developing support for PostgreSQL. We have done all the necessary testing to ensure that this will continue to be a viable product in the future. We plan to release this capability sometime after ArcGIS 9.2.

So, what does this mean for PostGIS? Same thing it means for Oracle Spatial – not very much. ESRI may, or may not, support using PostGIS native spatial geometries as the geometry type in ArcSDE. For Oracle, the default ESRI position has always been their SDEBINARY performs better than SDO_GEOMETRY, so it does not sound like using native types holds any particular allure for ESRI.

Even if ArcSDE does support PostGIS types, the ArcSDE versioning model means that all changes to the geometries will have to be done through the SDE API, in order to ensure the versioning metadata remains consistent.

Still, from a read-only perspective, if ArcSDE does support PostGIS as a geometry type, then the following architecture becomes possible, which could represent a big opportunity for some jurisdictions:

  • (DBMS) PostgreSQL Database
  • (ESRI Pound of Flesh) ArcSDE for PostgreSQL using PostGIS geometries
  • (Desktop Editing / Cartography) ArcGIS
  • (Desktop Viewing) QGIS
  • (Analysis Engine) GRASS
  • (Web Map Publishing) Mapserver
  • (Web Feature Publishing) Geoserver

If, on the other hand, ArcSDE on PostgreSQL only supports SDEBINARY, then this will be a non-event from an open source interoperability point of view. I look forward to hearing some reports from the ESRI UC – someone button-hole those ArcSDE developers and find out what the plan is!