Friday, December 01, 2006

Can WFS Really Work?

Of all the standards that have come out of the OGC in the last few years, few has had the promise of the Web Feature Server standard.
  • View and edit features over the web
  • Client independent
  • Server independent
  • Format independent
  • Database independent
What is not to like? Nothing!

One of the promises of uDig is to be an "internet GIS", by which we mean a thick client system capable of consuming and integrating web services in a transparent and low-friction way. The GIS equivalent of a web browser. Web browsers use HTTP and HTML and CSS and Javascript to create a rich and compelling client/server interaction, regardless of the client/server pairing. An internet GIS should use WMS and WFS and SLD to do the same thing, independent of vendor.

So, we have been working long and hard on a true WFS client, one that can connect to any WFS and read/write the features therein without modification. And here's the thing -- it is waaaaaaay harder than it should be.

Here is why:
  1. First off, generic GML support is hard. Every WFS exposes its own schema which in turn embeds GML, so a "GML parser" is actually a "generic XML parser that happens to also notice embedded GML", and the client has to be able to turn whatever odd feature collection the server exposes into its internal model to actually render and process it. However, it is only a hard problem, not an impossible one, and we have solved it.
  2. The solution to supporting generic GML is to read the schema advertised by the WFS, and use that to build a parser for the document on the fly. And this is where things get even harder: lots of servers advertise schemas that differ from the instance documents they actually produce.

    • The difference between schema and instance probably traces back to point #1 above. Because GML and XML schema are "hard", the developers make minor mistakes, and because there have not been generic clients around to heavily test the servers, the mistakes get out into the wild as deployed services.
So, once you have cracked the GML parsing problem (congratulations!) you run headlong into the next problem. Many of the servers have bugs and don't obey the schema/instance contract -- they do not serve up the GML that they say they do.

And now, if you aren't just building a university research project, you have a difficult decision. If you want to interoperate with the existing servers, you have to code exceptions around all the previously-deployed bugs.

Unfortunately, our much loved UMN Mapserver is both (a) one of the most widely deployed WFS programs and (b) the one with the most cases of schema/instance mismatch. Mapserver is not the only law-breaker though, we have found breakages even in proprietary products that passed the CITE tests.

All this before you even start editing features!

The relative complexity of WFS (compared to, say, WMS) means that the scope of ways implementors can "get it wrong" is much much wider, which in turn radically widens the field of "special cases to handle" that any client must write.

In some ways, this situation invokes to good old days of web browsers, when HTML purists argued that when encountering illegal HTML (like an unclosed tag) browsers should stop and spit up an error, while the browser writers themselves just powered through and tried to do a "best rendering" based on whatever crap HTML they happened to be provided with.


Andrea said...

Well, CITE conformance should be all you need to interoperate.
If this does not work, it means CITE alone is failing.

Maybe we should have something like the W3C validator, but for WFS stuff.
CITE tests a predefined set of data. The validator would work with any kind of data, checking that it really respects formal rules: you enter the capabilities url or your server, a wfs service request, and the service checks both your request and server answer for structural validity.

Something for OGC to consider as an amendment to the WFS testing harness?

Kristian Thy said...

Before you start implementing hacks for non-conformant WFS's, think about what happened to the web. We're still reeling from the effects of the browser wars.

Since one of the main culprits seem to be open source, wouldn't it make more sense to submit patches for MapServer rather than spend your time making workarounds for the same in uDig?

Paul Ramsey said...

We do report and do some patches on Mapserver, but that does not save us from systems that are already deployed.

Given a choice between a browser which choked for apparently random reasons (from a user point of view) and one which just showed the pages, which would you choose? Same thing goes for any client software ... you're the one closest to the user, and therefore most likely to get the blame for broken things.

King Timely said...

While I'm full of admiration for your goals, (universal WFS client with automatic run-time configuration by reading the XML Schema) I'm skeptical if it is realistic.

My view of WFS is that, like GML, it is best thought of as a toolkit for configurators.

A "Universal WFS Client" cannot have very rich functionality, because that depends on the semantics of the information.
Semantics is the domain of communities.
So a realistic scenario for non-trivial WFS(*) is that
(a) communities develop GML Appplication Schemas that support their data transfer needs,
(b) the community schema gets published as a standard "feature type catalogue",
(c) and WFS services and clients are configured *in advance* to support the community schema.
No run-time schema parsing.

(*) I understand that when people in the GIS community refer to a "universal client" they usually mean "2-D map portrayal client".
While this is useful, even these are not usually automatic as they require the user to at least select layers and symbolization, etc, based on "hints" coming (in this case) from XML tag names.
More interesting clients do all sorts of other processing tasks, which require much more configuration.
By subscribing to community schemas this configuration can be done "in advance".
The key to software success in this scenario is for its configurability to be developer-friendly.

Mikel said...

WFS Simple? Seems like an obtainable and useful subset of WFS.

Paul Ramsey said...

In fairness, this is not so much a WFS problem as a GML problem. The potential complexity of any arbitrary GML document means that you really need a schema document to parse it in a performant manner (so you can set up an event parser before reading the features from the document). And WFS inherits that necessary pairing. Or maybe it's not a GML problem, maybe it's an XML Schema problem -- people find XML Schema hard enough that they make mistakes translating their schemas into valid instances. But in any event, it's a problem.

Here's a mundane challenge: write a program that accepts "GML" as an input file type. Now, what happens when someone hands you a .gml file, with no .xsd file and a schema header that points to an unresolvable internet address? Either you choke ("your file might be full of interesting data you love, but I refuse to read it") or you try your best ("ok, I will prescan your file and try to intuit its structure that way").

The second way is the way that is "nice" to your end user, but it sure raises the question "why do we have all this fancy schema infrastructure if we end up coding around it in the end anyways?". Having some kind of flat shapefile-in-xml schema as a default encoding sure would have made things more predictable in implementation land, even if "full GML" remained a potential option for more involved data exchange.

About Me

My Photo
Victoria, British Columbia, Canada

Blog Archive


bc (44) it (35) postgis (24) video (15) enterprise IT (11) icm (11) gis (9) sprint (9) foi (8) open source (8) osgeo (8) enterprise (7) cio (6) foippa (6) foss4g (6) management (6) politics (6) spatial it (6) outsourcing (5) mapserver (4) bcesis (3) boundless (3) email (3) opengeo (3) oracle (3) rant (3) COTS (2) architecture (2) cartodb (2) deloitte (2) esri (2) hp (2) idm (2) javascript (2) natural resources (2) ogc (2) open data (2) openstudent (2) oss (2) postgresql (2) technology (2) vendor (2) web (2) 1.4.0 (1) HR (1) access to information (1) accounting (1) agile (1) archive (1) aspen (1) bcpoli (1) benchmark (1) buffer (1) build vs buy (1) business (1) business process (1) c (1) career (1) cathedral (1) client (1) cloud (1) code (1) common sense (1) consulting (1) contracting (1) core review (1) crm (1) crockofshit (1) cunit (1) custom (1) data science (1) data warehouse (1) design (1) development (1) digital (1) environment (1) essentials (1) evil (1) exadata (1) fcuk (1) fgdb (1) fme (1) foocamp (1) foss4g2007 (1) ftp (1) gdal (1) gds (1) geocortex (1) geometry (1) geoserver (1) geotiff (1) google (1) google earth (1) government (1) grass (1) hadoop (1) iaas (1) icio (1) imagery (1) industry (1) innovation (1) integrated case management (1) introversion (1) iso (1) isss (1) isvalid (1) jpeg (1) jts (1) lawyers (1) mapping (1) mcfd (1) media (1) microsoft (1) money (1) mysql (1) new it (1) nosql (1) nrs transformation (1) oipc (1) opengis (1) openlayers (1) paas (1) pgconfsv (1) pirates (1) policy (1) portal (1) proprietary software (1) public accounts (1) qgis (1) r (1) rdbms (1) recursion (1) redistribution (1) regression (1) rfc (1) right to information (1) saas (1) salesforce (1) sardonic (1) scandal (1) seibel (1) sermon (1) server (1) siebel (1) snark (1) spatial (1) standards (1) statistics (1) svr (1) taxi (1) tempest (1) texas (1) tired (1) transit (1) tripledelete (1) twitter (1) uber (1) udig (1) uk (1) uk gds (1) verbal culture (1) victoria (1) waterfall (1) wfs (1) where (1) with recursive (1) wkb (1)