Can WFS Really Work?

Of all the standards that have come out of the OGC in the last few years, few has had the promise of the Web Feature Server standard.

  • View and edit features over the web
  • Client independent
  • Server independent
  • Format independent
  • Database independent

What is not to like? Nothing!

One of the promises of uDig is to be an “internet GIS”, by which we mean a thick client system capable of consuming and integrating web services in a transparent and low-friction way. The GIS equivalent of a web browser. Web browsers use HTTP and HTML and CSS and Javascript to create a rich and compelling client/server interaction, regardless of the client/server pairing. An internet GIS should use WMS and WFS and SLD to do the same thing, independent of vendor.

So, we have been working long and hard on a true WFS client, one that can connect to any WFS and read/write the features therein without modification. And here’s the thing – it is waaaaaaay harder than it should be.

Here is why:

  • First off, generic GML support is hard. Every WFS exposes its own schema which in turn embeds GML, so a “GML parser” is actually a “generic XML parser that happens to also notice embedded GML”, and the client has to be able to turn whatever odd feature collection the server exposes into its internal model to actually render and process it. However, it is only a hard problem, not an impossible one, and we have solved it.
  • The solution to supporting generic GML is to read the schema advertised by the WFS, and use that to build a parser for the document on the fly. And this is where things get even harder: lots of servers advertise schemas that differ from the instance documents they actually produce.

    • The difference between schema and instance probably traces back to point #1 above. Because GML and XML schema are “hard”, the developers make minor mistakes, and because there have not been generic clients around to heavily test the servers, the mistakes get out into the wild as deployed services.

So, once you have cracked the GML parsing problem (congratulations!) you run headlong into the next problem. Many of the servers have bugs and don’t obey the schema/instance contract – they do not serve up the GML that they say they do.

And now, if you aren’t just building a university research project, you have a difficult decision. If you want to interoperate with the existing servers, you have to code exceptions around all the previously-deployed bugs.

Unfortunately, our much loved UMN Mapserver is both (a) one of the most widely deployed WFS programs and (b) the one with the most cases of schema/instance mismatch. Mapserver is not the only law-breaker though, we have found breakages even in proprietary products that passed the CITE tests.

All this before you even start editing features!

The relative complexity of WFS (compared to, say, WMS) means that the scope of ways implementors can “get it wrong” is much much wider, which in turn radically widens the field of “special cases to handle” that any client must write.

In some ways, this situation invokes to good old days of web browsers, when HTML purists argued that when encountering illegal HTML (like an unclosed tag) browsers should stop and spit up an error, while the browser writers themselves just powered through and tried to do a “best rendering” based on whatever crap HTML they happened to be provided with.

Flame Bait

Why end the evening on a high note, when I can end it rancourously and full of bile!

On the postgis-users mailing list, Stephen Woodbridge writes:

Can you describe what dynamic segmentation is? What is the algorithm? I guess I can google for it …

As with many things, the terminological environment has been muddied by the conflation of specific ESRI terms for particular features with generic terms for the similar things. Call it the “Chesterfield effect”.

  • ESRI “Dynamic segmentation” is really just “linear referencing of vectors and associated attributes”.
  • ESRI “Geodatabase” is “a database with a bunch of extra tables defined by and understood almost exclusively by ESRI”
  • ESRI “Coverage” is a “vector topology that covers an area” (ever wonder why the OGC Web Coverage Server specification is about delivering raster data, not vector topologies? because most people have a different understanding of the word than us GIS brainwashees).
  • ESRI “Topology” is a “middleware enforcement of spatial relationship rules”

ESRI rules the intellectual world of GIS people so thoroughly that they define the very limits of the possible. Just last week someone told me “oh, editing features over the web? the only way to do that is with ArcServer”.

The only way, and said with complete certainty. You don’t want to argue with people like that, it seems almost rude, like arguing with people about religion.

Tiles Tiles Tiles

One of the oddball tasks I came home from the FOSS4G conference with was the job of writing the first draft of a tiling specification. My particular remit was to do a server capable of handling arbitrary projections and scale sets, which made for an interesting design decision: to extend WMS or not?

I mulled it over at the conference, and talked to some of the luminaries like Paul Spencer and Allan Doyle. My concern was that the amound of alteration required to WMS in order to support the arbitrary projections and scales was such that there was not much benefit remaining in using the WMS standard in the first place – existing servers wouldn’t be able to implement, and existing clients wouldn’t be able to benefit.

On top of that, a number of the client writers wanted something a little more “tiley” in their specification than WMS. Rather than requests in coordinate space, they wanted requests in tile space: “give me tile [4,5]!”

So, I originally set off to write either a GetTile in WMS or a Tile Server using the Open Web Services baseline from the Open Geospatial Consortium.

But then I had an Intellectual Experience, which came from reading Sean Gillies’ blog on REST web services, and his thoughts on how Web Feature Server (WFS) could have been implemented more attractively as a REST interface. I was drawn in by the Abstract Beauty of the whole concept.

So I threw away the half-page of OWS boiler-plate I had started with and began anew, thinking about the tiling problem as a problem of exposing “resources” ala REST.

The result is the Tile Map Service specification, and no, it is not really all that RESTful. That’s because tiles themselves are really boring resources, and completely cataloguing a discovery path from root resource to individual tile would add a lot of scruft to the specification that client writers would never use. So I didn’t.

That was the general guiding principle I tried to apply during the process – what information can client writers use. Rather than writing for an abstract entity, I tried to think of the poor schmuck who would have to write a client for the thing and aim the content at him.

I have put up a reference server at http://mapserver.refractions.net/cgi-bin/tms and there are other servers referenced in the document. My colleague Jody Garnett is working on a client implementation in Java for the GeoTools library, for exposure in the uDig interface. Folks from OpenLayers and WorldKit have already built reference clients. It has been great fun!

Making SDIs Work

I failed to comment on my comments, which makes me a Bad Blogger. It is all about reprocessing content after all, so here goes…

Incentives and Costs

In response to the “Why SDIs Fail” posting, “randy” comments:

The key seems to be incentives and the only two I can think of are market incentives and policy mandate incentives. Market incentives are bottom up and way more appealing than top down legal/policy incentives.

And I agree. Incentives are lacking now, except for the “good karma” incentive. However, low incentives alone are not a barrier to action, it is low incentives in conjunction with a higher cost of action that cause inaction. The karmic incentive to not litter is relatively low, but the cost of not littering is also very low… hey, there’s a garbage can right over there.

So we can defeat SDI inaction through two possible routes: increase the incentives to participation, or decrease the costs.

Randy raises a number of possible external incentives, such as legal mandates, to push public organizations into the SDI arena. In particular for areas where there is a strong argument for mandated participation (public safety) this approach may have legs. But we know how lower orders of governments love unfunded mandates.

I personally think that decreasing the costs has better potential in the short term, by examining the data sharing networks that have succeeded – the illegal music and movie distribution networks. Everyone has an incentive to take and no one has an incentive to give, yet the content is out there. There are technical approaches to enhancing sharing in sharing-averse communities which can be scavenged from this arena and brought into ours.

Even Better Technology

Rob Atkinson looks into the future and sees that the tools we have now are not equipped for doing effective data sharing.

What we need is the mechanism by which SDIs can grow (from top-down and bottom-up) to bridge that gap. Much like DNS provides domain roots and the bind protocol. What we need to do to realise SDI benefits is, as you say, enable massively scalable access to data updates by making life significantly easier at the grass roots level, but also by introducing a level of coherence to enable investment decisions at the national level.

I agree that many “real” data sharing applications are going to need some super-amazing technology to bind together content. Ontology and deep metadata. But in the meantime, looser, more human-mediated approaches are required to bridge the gap.

As Rob says, life needs to be easier for the grass roots. That is job one. Once the data is flowing, the coneheads in the central agencies can figure out techno-magic to stitch it all together, but until the data starts flowing the whole discussion is just so much intellectual masturbation.

I Propose…

That job one is to get the data flowing. There needs to be a single, user-facing application, a GeoNapster, that makes sharing data and finding data ludicrously simple. So simple that there is no excuse not to do it except sheer bullheadedness. Get the data flowing and then worry about how to integrate it.

Recognize that data at the lowest levels of government is created by one or two people. Pitch the tool and approach to that level. Make it search and find just as well as it shares. Integrate it with the desktop, even with the major vendor software, if that makes it work more easily.

The data sets that are “corporately” managed by state and federal bureaucracies may have to wait, or be brought online in the mode of NCOneMap, with careful one-on-one cajolling. But the SDI builders have to know what they want, what is of value, and be strategic, not shot-gun, in gathering those contributors.

Being strategic means making hard decisions about what will be used, and what is useful, given the current technology available. Imagery is widely useful with the current technology. Complex inventory data usually is not (would you like to see the forests by stand age? species? do the different jurisdictions use the same inventory techniques? are these apples or oranges or both?) so do not waste money or time on it.

Get out to the operational GIS level, meet the people who are going to use these services (in theory) and feed in new information (in theory) and figure out how to get involved in their day to day work. How can an SDI become as ingrained in the daily workflow of GIS technologies as Google is in our techno-lives?

Put the strategic diagrams, the boxes and arrows, in a drawer for a while. They will still be there later, when the time comes.

Must SDIs Fail?

Did I say “tomorrow”? I meant “next month”.

Making Technology Work

Good news about the larger SDI vision is hard to come by. The larger vision includes a central catalogue, clients that search the catalogue, find services, and then consume the services. Portions of the vision are still “in progress” from a technological point of view – standardized searching of catalogues for services is still a topic of discussion, not a fully settled issue, so it will be a while before the problem can be delegated to the political realm of getting folks on board.

The good news is that the various parts of the vision are coming together.

  • The WMS specification is fully proven, and now widely implemented. Most importantly, it is implemented in proprietary desktop tools with wide user bases. When you think about the product development and release cycles involved, getting a new standard into a product is a pretty complex dance.
  • The WFS specification is on “phase one” of widespread adoption – a number of vendors have implemented half-assed attempts at WFS clients. The next round of attempts will probably be much better and useful, which will in turn charge up the relevance of the technology.

Note that I am talking about client implementations primarily when talking about the maturity of the standards. This is because the clients have always lagged behind the servers, perhaps because providing a good user experience of a complex technology is difficult. For example, imagine how much traction the HTTP/HTML server standard would have without a good HTTP/HTML client application (like the one you’re using to read this page).

(As an aside, keep the HTTP/HTML example in mind the next time someone says “The standard has been around for 3 years, and you can buy a server that supports it from 4 different vendors”. The answer is “Yeah, but are there any clients for the server that do not suck rocks? Toy web implementations from the server vendors don’t count.”)

So, back to good news: key client/server standards are moving from the “conceptual” stage to the “works for me” stage. Also, people are getting better at setting up the servers to not be disastrously slow – and as they garner more users, they will get more complaints when they do a bad job, which will self-reinforce the whole system to higher quality. (The NASA WMS administrators may hate what the Worldwind client has done in terms of creating load, but I am sure they would admit that the experience of figuring out how to solve the problem has made their services better.)

Getting Data Online

There are some good examples of organizations getting data online, and some bad examples. The bad ones usually consist of central organizations saying “For the love of Pete, put some data online! If you put data online, I’ll give you a cookie!”

The good ones are more directed, recognize the operational/analytical divide, and work to bridge that divide with face-to-face communication and assistance.

My favorite example is an SDI effort out of North Carolina, NCOneMap. With very little funding from the state government, NCOneMap is stitching together an SDI built on data served up from the county level. By all rights, this should be a colossal failure (herding poorly resourced, highly operational-minded counties into an SDI), but they are making it work, by:

  • Being focussed on what they really need. Yeah, it would be nice to get absolutely everything the county as online, to just say “put it all online”, but what the state really needs is what they don’t have, primarily a regularly updated street map, and a regularly updated parcel map. So that is what they focus on. Just getting each county to publish those two layers via WMS.
  • Being helpful making it happen. NCOneMap does not have a big budget, but they do have a few dedicated staff. The staff has written very detailed directions about configuring ArcIMS as a WMS server. They give workshops on setting up WMS and writing FGDC standard metadata. They go to the counties and help them one-on-one. In general, they recognize that a low-priority activity like publishing data online is going to require some lubrication to get moving, and are willing to apply it.
  • Establishing exactly what they want. They want a WMS with a couple layers, with certain metadata filled in a certain way, and they will train you in exactly how to make it, and even help you make it if you can’t. The bar is just about as low as it can go.

(I have often thought that the best way to get the bar on the floor would be to actually give a WMS server to each organization you want to publish data, with a filesystem share available through one network interface and another network interface for the internet connections, and have people just copy their files into a directory tree on the server. The server could automatically read the tree and convert it into a WMS/WFS server, and push that information up to the central SDI registry. Given the cost of hardware today and the availability of open source, open standard software, such an idea is inherently achievable.)

Creating an Example

One of the upsides of the technological maturity of the WMS standard (and upcoming maturity of the WFS standard) is that it is possible for SDI seed organizations to “set a good example” by publishing their large data holdings. Nothing hammers home the virtuous benefits of sharing more than sharing yourself, as St Francis of Assisi demonstrated.

One of the great instances of this concept is the “GeoBase” portion of the Canadian “GeoConnections” program. Unlike most national SDI programs, GeoConnections has actually has some non-trivial funding attached to it, and some of that money is put towards to creation of the GeoBase layers. One of the earliest big layers to be made available is a complete Landsat coverage of Canada, compiled from 2000-2005 and published via WMS.

By putting their money where their mouth is, national SDI promoters can simultaneously prove their commitment to the concept while demonstrating the effectiveness of web services. Note that it is important that the service provided be a very good example: the first time I tried to GeoBase Landsat WMS service it was very, very slow, and that initial impression has been lingering ever since.

Summary

We’re doomed! Doomed!

Unfortunately, the good news all adheres to various components of the SDI fabric, but the fabric itself continues to be torn apart by the lack of incentives to “good behavior” on the part of potential SDI participants.

  • The NCOneMap experience shows that careful nurturing and love can bring out better behavior from potential SDI participants, but even then 100% participation is not guaranteed. And maps with holes in them really suck.
  • The technological basis of SDI is continuing to mature, and some of it is particularly workable, but there is still some immaturity in core components of the fabric (like catalogues).
  • National agencies are trying to create good examples, by seeding the field with their own data, but so far the spirit of sharing has not been seeping much lower down the jurisdictional food chain.

It seems to me that failure is not guaranteed, but the equation of SDI participation still needs some fine tuning. Given that people have a low incentive to participate (“it makes me a better person”), what is required is a dramatic lowering of the bar to participation (“it takes longer to blow my nose than to participate in the SDI”).

For a parallel, look at the world of P2P music sharing. I have no incentive to share my music, I only really want to consume other people’s music. But when I pull music from others, the music sharing software automatically shares everything I pull down. Also, sharing my music is as easy as dragging files and dropping them into the application window. A GeoNapster could dramatically improve data sharing, by making the process very very very easy.

(Anyone want to hire us to build GeoNapster? I have the specifications all ready, and sitting in my brain. :)