Must SDIs Fail?

Did I say “tomorrow”? I meant “next month”.

Making Technology Work

Good news about the larger SDI vision is hard to come by. The larger vision includes a central catalogue, clients that search the catalogue, find services, and then consume the services. Portions of the vision are still “in progress” from a technological point of view – standardized searching of catalogues for services is still a topic of discussion, not a fully settled issue, so it will be a while before the problem can be delegated to the political realm of getting folks on board.

The good news is that the various parts of the vision are coming together.

  • The WMS specification is fully proven, and now widely implemented. Most importantly, it is implemented in proprietary desktop tools with wide user bases. When you think about the product development and release cycles involved, getting a new standard into a product is a pretty complex dance.
  • The WFS specification is on “phase one” of widespread adoption – a number of vendors have implemented half-assed attempts at WFS clients. The next round of attempts will probably be much better and useful, which will in turn charge up the relevance of the technology.

Note that I am talking about client implementations primarily when talking about the maturity of the standards. This is because the clients have always lagged behind the servers, perhaps because providing a good user experience of a complex technology is difficult. For example, imagine how much traction the HTTP/HTML server standard would have without a good HTTP/HTML client application (like the one you’re using to read this page).

(As an aside, keep the HTTP/HTML example in mind the next time someone says “The standard has been around for 3 years, and you can buy a server that supports it from 4 different vendors”. The answer is “Yeah, but are there any clients for the server that do not suck rocks? Toy web implementations from the server vendors don’t count.”)

So, back to good news: key client/server standards are moving from the “conceptual” stage to the “works for me” stage. Also, people are getting better at setting up the servers to not be disastrously slow – and as they garner more users, they will get more complaints when they do a bad job, which will self-reinforce the whole system to higher quality. (The NASA WMS administrators may hate what the Worldwind client has done in terms of creating load, but I am sure they would admit that the experience of figuring out how to solve the problem has made their services better.)

Getting Data Online

There are some good examples of organizations getting data online, and some bad examples. The bad ones usually consist of central organizations saying “For the love of Pete, put some data online! If you put data online, I’ll give you a cookie!”

The good ones are more directed, recognize the operational/analytical divide, and work to bridge that divide with face-to-face communication and assistance.

My favorite example is an SDI effort out of North Carolina, NCOneMap. With very little funding from the state government, NCOneMap is stitching together an SDI built on data served up from the county level. By all rights, this should be a colossal failure (herding poorly resourced, highly operational-minded counties into an SDI), but they are making it work, by:

  • Being focussed on what they really need. Yeah, it would be nice to get absolutely everything the county as online, to just say “put it all online”, but what the state really needs is what they don’t have, primarily a regularly updated street map, and a regularly updated parcel map. So that is what they focus on. Just getting each county to publish those two layers via WMS.
  • Being helpful making it happen. NCOneMap does not have a big budget, but they do have a few dedicated staff. The staff has written very detailed directions about configuring ArcIMS as a WMS server. They give workshops on setting up WMS and writing FGDC standard metadata. They go to the counties and help them one-on-one. In general, they recognize that a low-priority activity like publishing data online is going to require some lubrication to get moving, and are willing to apply it.
  • Establishing exactly what they want. They want a WMS with a couple layers, with certain metadata filled in a certain way, and they will train you in exactly how to make it, and even help you make it if you can’t. The bar is just about as low as it can go.

(I have often thought that the best way to get the bar on the floor would be to actually give a WMS server to each organization you want to publish data, with a filesystem share available through one network interface and another network interface for the internet connections, and have people just copy their files into a directory tree on the server. The server could automatically read the tree and convert it into a WMS/WFS server, and push that information up to the central SDI registry. Given the cost of hardware today and the availability of open source, open standard software, such an idea is inherently achievable.)

Creating an Example

One of the upsides of the technological maturity of the WMS standard (and upcoming maturity of the WFS standard) is that it is possible for SDI seed organizations to “set a good example” by publishing their large data holdings. Nothing hammers home the virtuous benefits of sharing more than sharing yourself, as St Francis of Assisi demonstrated.

One of the great instances of this concept is the “GeoBase” portion of the Canadian “GeoConnections” program. Unlike most national SDI programs, GeoConnections has actually has some non-trivial funding attached to it, and some of that money is put towards to creation of the GeoBase layers. One of the earliest big layers to be made available is a complete Landsat coverage of Canada, compiled from 2000-2005 and published via WMS.

By putting their money where their mouth is, national SDI promoters can simultaneously prove their commitment to the concept while demonstrating the effectiveness of web services. Note that it is important that the service provided be a very good example: the first time I tried to GeoBase Landsat WMS service it was very, very slow, and that initial impression has been lingering ever since.

Summary

We’re doomed! Doomed!

Unfortunately, the good news all adheres to various components of the SDI fabric, but the fabric itself continues to be torn apart by the lack of incentives to “good behavior” on the part of potential SDI participants.

  • The NCOneMap experience shows that careful nurturing and love can bring out better behavior from potential SDI participants, but even then 100% participation is not guaranteed. And maps with holes in them really suck.
  • The technological basis of SDI is continuing to mature, and some of it is particularly workable, but there is still some immaturity in core components of the fabric (like catalogues).
  • National agencies are trying to create good examples, by seeding the field with their own data, but so far the spirit of sharing has not been seeping much lower down the jurisdictional food chain.

It seems to me that failure is not guaranteed, but the equation of SDI participation still needs some fine tuning. Given that people have a low incentive to participate (“it makes me a better person”), what is required is a dramatic lowering of the bar to participation (“it takes longer to blow my nose than to participate in the SDI”).

For a parallel, look at the world of P2P music sharing. I have no incentive to share my music, I only really want to consume other people’s music. But when I pull music from others, the music sharing software automatically shares everything I pull down. Also, sharing my music is as easy as dragging files and dropping them into the application window. A GeoNapster could dramatically improve data sharing, by making the process very very very easy.

(Anyone want to hire us to build GeoNapster? I have the specifications all ready, and sitting in my brain. :)

Why SDIs Fail

My colleague Jody Garnett recently got to participate in a workshop on Spatial Data Infratructures (SDIs) at the United Nations, and he brought back a nice overview document written by a consultant to the UN. It lays out all the various national and sub-national SDI initiatives, and their strategies, and it all seems like very reasonable stuff.

Except that none of them are succeeding.

Some of them have been at it for over 10 years (like the NSDI work in the United States). Some of them have even been backed by some reasonable funding (like the CGDI in Canada). None of them really have public penetration. Not on the level of Google or Yahoo or Mapquest or MS Virtual Earth.

Is this success? It doesn’t feel like it. Even within the cognoscenti of GIS professionals the SDI initiatives have less intellectual traction than the consumer portals.

So perhaps this is failure. Why?

The Missing Incentive

SDIs make the virtuous and correct assertion that if everyone shared their data, nearest to the point of creation, using various technical tricks and standards, then distribution and integration cost for everyone would go down. We would all be winners.

It is a classic case of global accounting. Sum up everyone’s bottom line, and behold: in aggregate everyone is better off!

But not everyone is equally better off.

The operational folks, the guys actually building parcel fabrics and tenuring layers and physical infrastructure databases, are not doing it for fun, and they aren’t doing it to share. They are doing it to meet particular operational goals that are specific to themselves and themselves only.

Publishing their data to the rest of the world is a pure cost center for them. It is a small cost center, publishing the data is not hard, but it is still a cost. Publishing provides them no substantial business benefit. It provides other people with a huge benefit, hence the positive global bottom line, but for the most important people to the process, the data creators, it is just a pain in the ass.

I call this problem the “operational/analytical divide”, because it happens everywhere, from accounting registers to airline reservation systems. The needs of the people operationally working with the data do not align with the needs of the people driving up analysis to decision makers. Hence a million data marts bloom, and various other strategies to give the analysts the data they need without impeding the operational folks.

Note that the folks pushing SDI are almost exclusively coming from the analytical, decision support side of the equation. This is a recipe for frustration, particularly when trying to build something as loosely coupled as an SDI. So most SDI implementations still lack access to some of the most informative, up-to-date information, the operational information maintained and updated regularly by land managers.

This is not a recipe for relevance.

It Gets Worse

Often the local solution to the problem of operational disinterest is for the analytical side to take on the whole cost of integration and provide a single analytical environment for their jurisdiction.

Aha! Solution! These enlightened analysts will then take their integrated data and publish it out to the SDI and thus the world. Except the analysts suffer the same disease as the operational folks, just with larger borders.

In British Columbia, the analytical response to data integration problems was to build, over several years, a data warehouse which holds all the operational and inventory data, updated in real time. Cool stuff, and quite useful to them! But is this data available to the national SDI? Only a handful of layers.

Why? For the British Columbia analytical folks, providing data to the national SDI has little or no return-on-investment. So while they will acknowledge that it is a nice idea (it is a nice idea!), they will also acknowledge that it is not a high priority. What’s in it for them?

Users? What users?

On the technical side, even where data is available, the user experience has not been well served. The implementation of SDI technology thus far still seems “proof of concept”, getting as far as “does this idea work?”, but not as far as “can we make this work well?”

So, even where large quantities of data are published online (in the case of, for example the GeoBase LandSat data made available by the Canadian CGDI effort), the access is sufficiently slow to deter all but the most avid users. Specifications for response time (where they exist, and they often do not) are absurdly slow (perhaps they were written in the mid-1990s). Less than 5 seconds to return a map might be OK if I only want one map, but any interactive application will require me to request numerous maps in the course of exploring the data, and those 5 seconds really pile up.

Why was Google Maps a sensation, while Mapquest remained a yawn? Google optimized the part of the process most annoying to users, the interactive exploration part. Why did Google do this and not SDIs? One can only assume it was because building the system is the goal of SDI efforts (and not to be diminished, the system is very hard!) not thinking about the end user.

But (I can hear them yelling) the consumer space is not what SDIs are about! Then who are they about, because I cannot think of a market segment well served by SDI efforts to date. Not GIS professionals, they cannot abide the forest of web-mapping obstacles that have been erected in front of various data bases over the years. Not general public consumers, as noted above. Not decision makers, they can’t use the tools provided, and the professionals who serve them are still doing the same song-and-dance-and-secret-handshake number they were doing 10 years ago to achieve data access.

Attending a talk this year about the emergency response in British Columbia to bird flu, I was struck that despite all the state-of-the-art technology being used (ArcMap, SDE, LANs and wireless) the fundamental business architecture (madly collect big pile of data, madly print maps) is no different than if the talk were given 10 years ago. No SDI in sight, despite the near-constant pitch of SDI as a cure-all for integration in critical response settings.

Oh, And the Unpleasant Matter of the Bill

Of course, no discussion of the impotence of SDI efforts could fail to acknowledge the pivotal role of data cost recovery policies in blunting the sharing of information between organizations and levels of government.

The most valuable information for sharing and decision making is the most volatile, because direct close-to-source access assures timeliness and accuracy. But volatile data is also perceived (correctly) as having the most intrinsic value. And so the creators hold it close, and only share it in traditional ways (file shipping, protected download) using traditional business models (pay to play).

Did I Mention Metadata?

And you thought you could get out without being subjected to yet another discussion of metadata.

Metadata is the core of the SDI vision, since a distributed system of resoures requires a searchable registry of available resources in order to direct people to the information they need.

But, like basic sharing itself, metadata creation suffers from missing incentives, because metadata is all about other people (not important people, like me). I don’t need to write metadata about my data, I already know all about my data!. There are no awards for good meta-data writing, except karmic ones, and karma doesn’t pay the bills.

This may sound overly harsh, but a quick review of the contents of the public capabilities files of web services currently online should dispell most people’s notions of metadata quality.

Loosely Coupled Things Are Easy To Break

One of the ironies of the SDI-for-crisis-response sales pitch is that it holds up the classic centralized solution as having a “single point of failure” (boo! shame!). Unspoken is the fact that the SDI solution has N points of failure, where N is the number of nodes.

Our company had the fortune to have the opportunity to build a client portal that used SDI resources to provide an integrated viewing and data searching capability. It was “SDI come to life”, and a great learning experience. Though our code worked fine, our delivery was a nightmare.

  • The map application was too slow. Well, that wasn’t our code’s fault, one of the servers we referenced for base mapping was really slow. Eventually we found a faster one.
  • The search service(s) returned really terse results. Well, that was what the metadata server returned, we had no control over that. (Actually we did, we could obtain more verbose results but then the search took several minutes to return. Pick your poison.)
  • The search service was really slow. Again, ask a slow server a question, get a slow answer.
  • And finally the coup de grâce. After delivery, the application kept breaking. Or rather, the services it depended on kept breaking. Or changing their API just a little. But since we were the client-facing part of the whole system, it was us who got the service calls.

Reliability is not something any part of the SDI infrastructure takes overall responsibility for, so any component can drag down any of the others. In the end, it all gets delegated to the last link in the loosely coupled chain, the client software. But the architecture is supposed to make getting data easy, and instead it makes it really hard, because the client has to keep track of all the servers and their reliability.

Some SDI server providers make an effort to provide really high uptimes and great performance, but it only takes one crappy server to make the user experience terrible. So the whole system reliability problem gets delegated to the client. That makes for a lot of work on systems which are supposed to be relatively lightweight.

We’re Doomed! Doomed!

So, to sum up:

  • Data creators have no incentive to join SDIs;
  • Data analyzers have incentive, but once they have all the data they need, they’re done;
  • The user experience is pretty uniformly worse than what commercial outfits provide;
  • Data pricing policies are locking up much of the “good” data, even if people were interested in sharing;
  • Even the people who are sharing aren’t doing a very good job of it, since their metadata is garbage; and,
  • When put into operation, the whole system is unbelievably brittle, leading to yet more bad user experiences.

Is there no good news at all? Sure, and tomorrow I will write about that.

I Don't Hate Users

Really! I don’t! (Not much. Well. Not all the time.)

I start with something that sounds controversial, but really is just observationally factual: “users do not contribute anything to open source projects”. It sounds like I am putting them down (“darned users!”) but that is not what I am doing. It is just straightforward observation that if you aren’t adding code, documentation, bug reports, or user support to a project, if you are just downloading and using the software, you are not adding any value to the project. From the project point of view, you might as well not exist. If it weren’t for the entry in the server log, they couldn’t prove your existence.

I hope that the above is not actually controversial.

Which brings us to a different idea, “user focused”, why should I bother catering to people who are effectively invisible to the activities of the project? There is nice comment from “step” which includes this:

There are a lot of contributors to Firefox (myself included) who never would have become so had they not first been a user.

So, there is a valid “developer focused” reason to be “user focused”. Using is a gateway drug for developers. I find that very persuasive indeed. Hopefully non-controversial as well.

All this gets bound up and confuzzulated with my “firehose of money” comment in my last posting. That comment is controversial, and seems to be the core item riling folks up.

My belief (and I am sure there are plenty of counterexamples to prove me wrong, but then, I am not trying to postulate a law of nature here) is that large, complex, user-facing applications (like, say, office suites or web browsers or GIS desktops) are sufficiently weighty that incubating them, getting them from a “concept” level of quality to a “user focused” level of quality, requires a large initial financial commitment. Or a large initial amount of time (Ooop, you caught me! I’m hedging! Creating some wriggle room!). A firehouse of money, or a patient drip drip drip of a rivulet of time.

Asa Dotzler thinks that, with respect to the Firefox project (and unspecified others), I am full of it. I think that, in ignoring the six years of corporate funding (the proverbial “firehose”) for technology that underlies Firefox, he is gilding the lily. Firefox is great work, but attributing its success to the community effort alone is as unfair as attributing it to corporate money alone. Ooops, I did that. Sorry!

Shades of grey, always ahades of grey.

An open source GIS on every Desktop?

Steven Citron-Pousty has a good posting about why he cannot move his shop to an open source basis in the near term. He took a look at uDig (thanks!) but, unsurprisingly, finds that it ain’t ArcGIS quite yet. His prescription is not for the faint of heart.

If you want people to switch you need to make the transition as painless as possible. Firefox got people to switch to IE by

  • Making better software
  • Not making user learn a new UI for interacting with the web
  • importing all their IE favorites
  • THEN building in cool new features that keep people around

So, all we have to do is make something better than ArcGIS, but not so much better that it is not familiar to the existing user base, that works transparently with all their existing data and presumably their .mxd files too. And give it away for free.

“Just be like Firefox.” There are a couple of problems with this idea.

  • The first problem is the idea that garnering users is “the goal”. It is not. The misunderstanding is reasonable, because for proprietary companies it is the goal – more users implies more licensing dollars. For open source projects, more users just means somewhat more download bandwidth and slightly higher number of beginner questions on the mailing lists. What open source projects want to attract is not users – it is developers. Developers will make the project stronger, add features, fix bugs, do all the things that end users want, but cannot do for themselves.
  • The second problem is that Firefox is not a normal open source project. The Mozilla Foundation has a lot of employees, most of them working on development, and a deal with Google that nets them millions of dollars each year. They can afford to be end user focused, because they have a paid pool of developers already in house.

It seems like all of the “user” success stories in the open source world (Firefox, Open Office, some of the “desktop Linux” efforts) have at their core one common feature – a large and ongoing firehose of money. Absent the firehose, it is hard to aggregate enough continuous effort to create desktop applications that include both the number of features and quality of finish necessary to entice the otherwise unmotivated “user”.

uDig has enough features and stability to be useful right now to a narrow pool of developers creating custom applications with specific toolsets. Hopefully in the future it will have more features and stability, and the pool will be less narrow. But I predict it will still be dominated by developers.

And that’s fine with me.

If Steven adopted uDig and PostGIS for his shop, it would not do a thing for my bottom line. But if he built an application around it, he would either add a little functionality (which would help me with my clients) or maybe hire us to add a little functionality (which would help the bottom line directly).

Open source is not about users, it is about developers. It is only about users in so far as users become sufficiently engaged in the project that they either become developers themselves, or support developers through careful bug finding or documentation.

The correct models are not Firefox or Open Office (unless someone wants to point a money firehose at me… I won’t object) those projects are aberations. The correct models are Linux, Apache, Perl, PostgreSQL – not user-friendly, but still very useful.

Case Studies Considered Harmful?

Over the past month, I have been trying to compile a list of good case studies of organizations using PostGIS in their daily business. So far, it seems hard to get people out of their shells and say what they are doing – even when you promise to do all the work of writing up the story!

I know from things like the membership of the postgis-users mailing list that there are some big companies using PostGIS. Big names in the geospatial world use it for all kinds of production oriented tasks. But apparently they do not want their stories told.

This is not a problem unique to PostGIS. Other open source projects suffer from the same “shy user” syndrome. I read the postgresql-advocacy list and often see comments to the effect that “my client is a huge company, and they love the performance they are getting from PostgreSQL, but they do not want to be publicly named”.

What will it take to get big organizations to “out” themselves?