Wednesday, September 27, 2006

Why SDIs Fail

My colleague Jody Garnett recently got to participate in a workshop on Spatial Data Infratructures (SDIs) at the United Nations, and he brought back a nice overview document written by a consultant to the UN. It lays out all the various national and sub-national SDI initiatives, and their strategies, and it all seems like very reasonable stuff.

Except that none of them are succeeding.

Some of them have been at it for over 10 years (like the NSDI work in the United States). Some of them have even been backed by some reasonable funding (like the CGDI in Canada). None of them really have public penetration. Not on the level of Google or Yahoo or Mapquest or MS Virtual Earth.

Is this success? It doesn't feel like it. Even within the cognoscenti of GIS professionals the SDI initiatives have less intellectual traction than the consumer portals.

So perhaps this is failure. Why?

The Missing Incentive

SDIs make the virtuous and correct assertion that if everyone shared their data, nearest to the point of creation, using various technical tricks and standards, then distribution and integration cost for everyone would go down. We would all be winners.

It is a classic case of global accounting. Sum up everyone's bottom line, and behold: in aggregate everyone is better off!

But not everyone is equally better off.

The operational folks, the guys actually building parcel fabrics and tenuring layers and physical infrastructure databases, are not doing it for fun, and they aren't doing it to share. They are doing it to meet particular operational goals that are specific to themselves and themselves only.

Publishing their data to the rest of the world is a pure cost center for them. It is a small cost center, publishing the data is not hard, but it is still a cost. Publishing provides them no substantial business benefit. It provides other people with a huge benefit, hence the positive global bottom line, but for the most important people to the process, the data creators, it is just a pain in the ass.

I call this problem the "operational/analytical divide", because it happens everywhere, from accounting registers to airline reservation systems. The needs of the people operationally working with the data do not align with the needs of the people driving up analysis to decision makers. Hence a million data marts bloom, and various other strategies to give the analysts the data they need without impeding the operational folks.

Note that the folks pushing SDI are almost exclusively coming from the analytical, decision support side of the equation. This is a recipe for frustration, particularly when trying to build something as loosely coupled as an SDI. So most SDI implementations still lack access to some of the most informative, up-to-date information, the operational information maintained and updated regularly by land managers.

This is not a recipe for relevance.

It Gets Worse

Often the local solution to the problem of operational disinterest is for the analytical side to take on the whole cost of integration and provide a single analytical environment for their jurisdiction.

Aha! Solution! These enlightened analysts will then take their integrated data and publish it out to the SDI and thus the world. Except the analysts suffer the same disease as the operational folks, just with larger borders.

In British Columbia, the analytical response to data integration problems was to build, over several years, a data warehouse which holds all the operational and inventory data, updated in real time. Cool stuff, and quite useful to them! But is this data available to the national SDI? Only a handful of layers.

Why? For the British Columbia analytical folks, providing data to the national SDI has little or no return-on-investment. So while they will acknowledge that it is a nice idea (it is a nice idea!), they will also acknowledge that it is not a high priority. What's in it for them?

Users? What users?

On the technical side, even where data is available, the user experience has not been well served. The implementation of SDI technology thus far still seems "proof of concept", getting as far as "does this idea work?", but not as far as "can we make this work well?"

So, even where large quantities of data are published online (in the case of, for example the GeoBase LandSat data made available by the Canadian CGDI effort), the access is sufficiently slow to deter all but the most avid users. Specifications for response time (where they exist, and they often do not) are absurdly slow (perhaps they were written in the mid-1990s). Less than 5 seconds to return a map might be OK if I only want one map, but any interactive application will require me to request numerous maps in the course of exploring the data, and those 5 seconds really pile up.

Why was Google Maps a sensation, while Mapquest remained a yawn? Google optimized the part of the process most annoying to users, the interactive exploration part. Why did Google do this and not SDIs? One can only assume it was because building the system is the goal of SDI efforts (and not to be diminished, the system is very hard!) not thinking about the end user.

But (I can hear them yelling) the consumer space is not what SDIs are about! Then who are they about, because I cannot think of a market segment well served by SDI efforts to date. Not GIS professionals, they cannot abide the forest of web-mapping obstacles that have been erected in front of various data bases over the years. Not general public consumers, as noted above. Not decision makers, they can't use the tools provided, and the professionals who serve them are still doing the same song-and-dance-and-secret-handshake number they were doing 10 years ago to achieve data access.

Attending a talk this year about the emergency response in British Columbia to bird flu, I was struck that despite all the state-of-the-art technology being used (ArcMap, SDE, LANs and wireless) the fundamental business architecture (madly collect big pile of data, madly print maps) is no different than if the talk were given 10 years ago. No SDI in sight, despite the near-constant pitch of SDI as a cure-all for integration in critical response settings.

Oh, And the Unpleasant Matter of the Bill

Of course, no discussion of the impotence of SDI efforts could fail to acknowledge the pivotal role of data cost recovery policies in blunting the sharing of information between organizations and levels of government.

The most valuable information for sharing and decision making is the most volatile, because direct close-to-source access assures timeliness and accuracy. But volatile data is also perceived (correctly) as having the most intrinsic value. And so the creators hold it close, and only share it in traditional ways (file shipping, protected download) using traditional business models (pay to play).

Did I Mention Metadata

And you thought you could get out without being subjected to yet another discussion of metadata.

Metadata is the core of the SDI vision, since a distributed system of resoures requires a searchable registry of available resources in order to direct people to the information they need.

But, like basic sharing itself, metadata creation suffers from missing incentives, because metadata is all about other people (not important people, like me). I don't need to write metadata about my data, I already know all about my data!. There are no awards for good meta-data writing, except karmic ones, and karma doesn't pay the bills.

This may sound overly harsh, but a quick review of the contents of the public capabilities files of web services currently online should dispell most people's notions of metadata quality.

Loosely Coupled Things Are Easy To Break

One of the ironies of the SDI-for-crisis-response sales pitch is that it holds up the classic centralized solution as having a "single point of failure" (boo! shame!). Unspoken is the fact that the SDI solution has N points of failure, where N is the number of nodes.

Our company had the fortune to have the opportunity to build a client portal that used SDI resources to provide an integrated viewing and data searching capability. It was "SDI come to life", and a great learning experience. Though our code worked fine, our delivery was a nightmare.
  • The map application was too slow. Well, that wasn't our code's fault, one of the servers we referenced for base mapping was really slow. Eventually we found a faster one.
  • The search service(s) returned really terse results. Well, that was what the metadata server returned, we had no control over that. (Actually we did, we could obtain more verbose results but then the search took several minutes to return. Pick your poison.)
  • The search service was really slow. Again, ask a slow server a question, get a slow answer.
  • And finally the coup de grâce. After delivery, the application kept breaking. Or rather, the services it depended on kept breaking. Or changing their API just a little. But since we were the client-facing part of the whole system, it was us who got the service calls.
Reliability is not something any part of the SDI infrastructure takes overall responsibility for, so any component can drag down any of the others. In the end, it all gets delegated to the last link in the loosely coupled chain, the client software. But the architecture is supposed to make getting data easy, and instead it makes it really hard, because the client has to keep track of all the servers and their reliability.

Some SDI server providers make an effort to provide really high uptimes and great performance, but it only takes one crappy server to make the user experience terrible. So the whole system reliability problem gets delegated to the client. That makes for a lot of work on systems which are supposed to be relatively lightweight.

We're Doomed! Doomed!

So, to sum up:
  • Data creators have no incentive to join SDIs;
  • Data analyzers have incentive, but once they have all the data they need, they're done;
  • The user experience is pretty uniformly worse than what commercial outfits provide;
  • Data pricing policies are locking up much of the "good" data, even if people were interested in sharing;
  • Even the people who are sharing aren't doing a very good job of it, since their metadata is garbage; and,
  • When put into operation, the whole system is unbelievably brittle, leading to yet more bad user experiences.
Is there no good news at all? Sure, and tomorrow I will write about that.

4 comments:

randy said...

A gloomy viewpoint, but unfortunately an accurate assessment. I look forward to hearing your blueprint for success. :)

The key seems to be incentives and the only two I can think of are market incentives and policy mandate incentives. Market incentives are bottom up and way more appealing than top down legal/policy incentives.

Google, MS VE, MapQuest etc seem to be doing a good job of harnessing market incentives mainly through the advertising angle of new media. But how can ad incentives work in the public domain data pools?

Here in the USA we have mandated data sharing by airports via FAA policy but this is far from public. Homeland Security initiatives appear to be headed down the same path for critical infrastructure publishing requirements, but because the data is 'critical' it will obviously not be public.

Perhaps there is some inevitable 3rd way. Community incentives appear to be popular right now ala web 2.0. But what kind of community incentives apply to shared mapping data? MySpace or Flickr type communities are based on individual relationship incentives coupled with ad $$ for the aggregator. But bureaucracies are a different animal? If a county has a need for parcels from neighboring counties, perhaps it could be persuaded to throw its parcels into the data pool but the cost would have to be vanishingly small.

Aggregation costs money. The toll road, subscription model has a long history as does the ad model, but tax capital seems more common in public services. (I suppose billboards along the interstate have contributed to the economy but only tax or toll could make the capital investment of creation possible and in the end tax was the most effective)

Looking at bureaucratic incentives could be an alternative. Again in the US we have decennial census tax dollars funneled through the Census Bureau which could be used to create a continuously updated aggregation portal for boundary and transportaion features. Local government entities could be paid to maintain their data resources in a national repository rather than a local one. This would look like a kind of national service model, a national DOT data repository or a national land ownership service. Local resources for storage along with technical maintenance costs could be pushed up the line, while collection and creation costs of the Census Bureau could be pushed down line closer to the source. This is similar to the incentive of the pay per use SalesForce model. However, an existing tax supported program takes the place of the commercial service. I'd like to think WFS-t OGC standards make this kind of model possible.

National service repositories could benefit the over all economic bottom line but only at the cost of subtracting from existing federal agencies like the Census budget, which is hardly an incentive for a national bureaucracy.

I still wonder if funding NSDI to implement a national data repository service would be a feasible approach. As far as I know the US NSDI lacks the budget or policy clout to create a useful public repository service. The necessary multi thousand node server clusters and service development don’t even seem to be on the radar for the bureaucracy battle fields of NSDI, FGDC, GOS, TNM, HAZUS, NILS, RCAGIS … Existing federal agencies could benefit from participation but only at the cost of budget loss.

Like I say I am really looking forward to your blueprint for success.

thanks
Randy

Rob Atkinson said...

I agree that current practices do not work, and have been attempting to apply some cold hard logic to the situation. The outcome of the workshop is intended to identify a realistic implementation strategy. From my perspective the current SDI theory and practice are patently inadequate, and for the first time we were able to thrash some key points around the table and get a consensus agreement. My understanding is that this will be used to qualify the business strategy.

I tend to agree with your assessment of requirements. You are a step ahead of me though - in a separate document tabled at the workshop I was pointing out that there is a massive gap between the SDI motherhood statements and the "throw technology at the current business situation".

The consumer portals do not invalidate SDI assumptions - they are merely the private expression of the concept. Some of the technology components are more mature (in particular the engineering side) , some less so, than "standards based" SDIs, but they still show that distributing business data and delivering commodity data from efficient hubs is sensible. None of the portals works adequately outside of the major market areas, or thematic areas.

I dont think we're ever going to see Google Water Resource Management to the point where its going to save lives and avoid conflicts - not without a formal SDI bridge.

I also dont see its too difficult to integrate commercial components into an SDI - commodification of data doesnt mean that only governments can serve it - only that they should do so for the public interest applications like planning and emergency response. If they serve it via an outsourced contract to a commercial entity, thats an operational decision.

Much more interesting is why cant the commercial portals, and whether ever will, maintain large coherent, diverse thematic data holdings. IMHO this will only be enabled by data standards (de facto or de jure) - and I dont mean data formats. This is the reason we need SDI's - not just to ship data around but to make it possible for data to be used.

What we need is the mechanism by which SDIs can grow (from top-down and bottom-up) to bridge that gap. Much like DNS provides domain roots and the bind protocol. What we need to do to realise SDI benefits is, as you say, enable massively scalable access to data updates by making life significantly easier at the grass roots level, but also by introducing a level of coherence to enable investment decisions at the national level.

Rob Atkinson

Bruce Westcott said...

Your assessment of the key reason for SDI failure can be summarized as "Cost:Me. Benefit:Them." You twist the knife (IMHO) by focusing on metadata as a specific instance of the above formulation. Bravo!
People actually pay me to be a geospatial metadata specialist, and I have worked with the US/FGDC and its constituents for a long time. My speaking, writing, and training efforts stir around in the geeky details of metadata, but focus on urging metadata advocates to recognize your formulation in actually developing a business plan for metadata. Specifically, we all need to focus on MINIMIZING the costs of creating/maintaining metadata and MAXIMIZING the value of the information contained therein to all our various constituencies, be they internal to the enterprise, or external (SDI). Most enterprises are way heavy on lip service to metadata, and way light on workflow design, resulting in costly and inconsistent metadata. That makes it easy to blow off.

WS said...

Just a very brief comment. From my point of view, this is not a fair comparison between companies delivering GI and SDIs. Since the objective - from the user point of view - could be similar, the approach is totally different.
An SDI should be seen in the long-term, since an SDI - at any level - is a collaborative approach between each of its nodes. As you go "up" in the SDI hierarchy you are going up in finding and facing more organizational problems - a local SDI, for sure, has less organizational problems than National SDIs,an upper-level SDI inherit organizational problems from low-level SDIs-
Besides these organizational problems, you have to add the technical ones, but this is another history.

The actual role of this companies is to fill the gap SDIs are leaving while they are developing. The aim of these companies was to fill the market gap in WebGIS apps and data, focused in the non experienced users..and they succeeded.
SDIs have not market niche to fill since this is not their aim - and maybe this makes its development more slow.

So, since the user point of view - to see GI, GIS apps, etc,etc - could seems the same, they aren't. They share only the issue of to see "GI in a map through out Internet", and its clear that a SDI is something more.

Cheers,

Walter Simonazzi

About Me

My Photo
Paul Ramsey
Victoria, British Columbia, Canada
View my complete profile

Followers

Blog Archive