PostGIS for SDE

One of the interesting nuggets to come out of the ESRI User Conference this year was the news that ESRI was going to support ArcSDE on PostgreSQL “sometime soon”. Which, to PostGIS people like ourselves suggests the question: “implemented how?”

  • One possibility would be basically a cut’n’paste of their existing SQLServer code, with the SQLServer quirks replaced with PostgreSQL quirks, using SDEBINARY as the spatial type.
  • Another possibility would be to use the PostGIS spatial objects as the underlying storage mechanism, in the same way ArcSDE supports using SDO_GEOMETRY in Oracle.
  • A third possibility would be ESRI implementing their own spatial type in PostgreSQL and then using that.

Sounds strange, doesn’t it? Writing a whole new spatial type, when one already exists. Ordinarily I would dismiss the idea – except that ESRI has already done it for Oracle!.

The ST_GEOMETRY type in ArcSDE 9.1 and up is a native Oracle type (built using the Oracle type-extension mechanism) provided, and recommended, by ESRI for use by ArcSDE.

Why would ESRI do this?

The cynical explanation (get this out of the way first) is that it helps break the growing Oracle momentum in tools supporting SDO_GEOMETRY, and confuses the marketplace further about what the “right type” to use is in Oracle for spatial work.

The practical explanation is that ESRI’s ST_GEOMETRY for Oracle implements the same semantics and function signatures as the ST_GEOMETRY objects in DB2 and Informix (coincidentally, also implemented in part by ESRI). This allows ArcSDE to expose a uniform “raw spatial SQL” to clients while still maintaining it’s position as the man-in-the-middle of client/server interaction. Adding ST_GEOMETRY further reinforces the “database neutral” aspect of ArcSDE by allowing spatial SQL without exposing the differences between the SDO_GEOMETRY function signatures and the ST_GEOMETRY ones.

So where does that leave PostGIS? Removing the practical excuses for not using PostGIS as the underlying geometry type as fast as possible. We have looked up the function signatures used by ArcSDE and implemented them for the 1.1.7 release.

If anyone on the ArcSDE team reads this and wants to talk about what else is needed to make PostGIS the default geometry type for ArcSDE-on-PostgreSQL, get in touch. We aim to please.

Can WFS Really Work?

Of all the standards that have come out of the OGC in the last few years, few has had the promise of the Web Feature Server standard.

  • View and edit features over the web
  • Client independent
  • Server independent
  • Format independent
  • Database independent

What is not to like? Nothing!

One of the promises of uDig is to be an “internet GIS”, by which we mean a thick client system capable of consuming and integrating web services in a transparent and low-friction way. The GIS equivalent of a web browser. Web browsers use HTTP and HTML and CSS and Javascript to create a rich and compelling client/server interaction, regardless of the client/server pairing. An internet GIS should use WMS and WFS and SLD to do the same thing, independent of vendor.

So, we have been working long and hard on a true WFS client, one that can connect to any WFS and read/write the features therein without modification. And here’s the thing – it is waaaaaaay harder than it should be.

Here is why:

  • First off, generic GML support is hard. Every WFS exposes its own schema which in turn embeds GML, so a “GML parser” is actually a “generic XML parser that happens to also notice embedded GML”, and the client has to be able to turn whatever odd feature collection the server exposes into its internal model to actually render and process it. However, it is only a hard problem, not an impossible one, and we have solved it.
  • The solution to supporting generic GML is to read the schema advertised by the WFS, and use that to build a parser for the document on the fly. And this is where things get even harder: lots of servers advertise schemas that differ from the instance documents they actually produce.

    • The difference between schema and instance probably traces back to point #1 above. Because GML and XML schema are “hard”, the developers make minor mistakes, and because there have not been generic clients around to heavily test the servers, the mistakes get out into the wild as deployed services.

So, once you have cracked the GML parsing problem (congratulations!) you run headlong into the next problem. Many of the servers have bugs and don’t obey the schema/instance contract – they do not serve up the GML that they say they do.

And now, if you aren’t just building a university research project, you have a difficult decision. If you want to interoperate with the existing servers, you have to code exceptions around all the previously-deployed bugs.

Unfortunately, our much loved UMN Mapserver is both (a) one of the most widely deployed WFS programs and (b) the one with the most cases of schema/instance mismatch. Mapserver is not the only law-breaker though, we have found breakages even in proprietary products that passed the CITE tests.

All this before you even start editing features!

The relative complexity of WFS (compared to, say, WMS) means that the scope of ways implementors can “get it wrong” is much much wider, which in turn radically widens the field of “special cases to handle” that any client must write.

In some ways, this situation invokes to good old days of web browsers, when HTML purists argued that when encountering illegal HTML (like an unclosed tag) browsers should stop and spit up an error, while the browser writers themselves just powered through and tried to do a “best rendering” based on whatever crap HTML they happened to be provided with.

Flame Bait

Why end the evening on a high note, when I can end it rancourously and full of bile!

On the postgis-users mailing list, Stephen Woodbridge writes:

Can you describe what dynamic segmentation is? What is the algorithm? I guess I can google for it …

As with many things, the terminological environment has been muddied by the conflation of specific ESRI terms for particular features with generic terms for the similar things. Call it the “Chesterfield effect”.

  • ESRI “Dynamic segmentation” is really just “linear referencing of vectors and associated attributes”.
  • ESRI “Geodatabase” is “a database with a bunch of extra tables defined by and understood almost exclusively by ESRI”
  • ESRI “Coverage” is a “vector topology that covers an area” (ever wonder why the OGC Web Coverage Server specification is about delivering raster data, not vector topologies? because most people have a different understanding of the word than us GIS brainwashees).
  • ESRI “Topology” is a “middleware enforcement of spatial relationship rules”

ESRI rules the intellectual world of GIS people so thoroughly that they define the very limits of the possible. Just last week someone told me “oh, editing features over the web? the only way to do that is with ArcServer”.

The only way, and said with complete certainty. You don’t want to argue with people like that, it seems almost rude, like arguing with people about religion.

Tiles Tiles Tiles

One of the oddball tasks I came home from the FOSS4G conference with was the job of writing the first draft of a tiling specification. My particular remit was to do a server capable of handling arbitrary projections and scale sets, which made for an interesting design decision: to extend WMS or not?

I mulled it over at the conference, and talked to some of the luminaries like Paul Spencer and Allan Doyle. My concern was that the amound of alteration required to WMS in order to support the arbitrary projections and scales was such that there was not much benefit remaining in using the WMS standard in the first place – existing servers wouldn’t be able to implement, and existing clients wouldn’t be able to benefit.

On top of that, a number of the client writers wanted something a little more “tiley” in their specification than WMS. Rather than requests in coordinate space, they wanted requests in tile space: “give me tile [4,5]!”

So, I originally set off to write either a GetTile in WMS or a Tile Server using the Open Web Services baseline from the Open Geospatial Consortium.

But then I had an Intellectual Experience, which came from reading Sean Gillies’ blog on REST web services, and his thoughts on how Web Feature Server (WFS) could have been implemented more attractively as a REST interface. I was drawn in by the Abstract Beauty of the whole concept.

So I threw away the half-page of OWS boiler-plate I had started with and began anew, thinking about the tiling problem as a problem of exposing “resources” ala REST.

The result is the Tile Map Service specification, and no, it is not really all that RESTful. That’s because tiles themselves are really boring resources, and completely cataloguing a discovery path from root resource to individual tile would add a lot of scruft to the specification that client writers would never use. So I didn’t.

That was the general guiding principle I tried to apply during the process – what information can client writers use. Rather than writing for an abstract entity, I tried to think of the poor schmuck who would have to write a client for the thing and aim the content at him.

I have put up a reference server at and there are other servers referenced in the document. My colleague Jody Garnett is working on a client implementation in Java for the GeoTools library, for exposure in the uDig interface. Folks from OpenLayers and WorldKit have already built reference clients. It has been great fun!

Making SDIs Work

I failed to comment on my comments, which makes me a Bad Blogger. It is all about reprocessing content after all, so here goes…

Incentives and Costs

In response to the “Why SDIs Fail” posting, “randy” comments:

The key seems to be incentives and the only two I can think of are market incentives and policy mandate incentives. Market incentives are bottom up and way more appealing than top down legal/policy incentives.

And I agree. Incentives are lacking now, except for the “good karma” incentive. However, low incentives alone are not a barrier to action, it is low incentives in conjunction with a higher cost of action that cause inaction. The karmic incentive to not litter is relatively low, but the cost of not littering is also very low… hey, there’s a garbage can right over there.

So we can defeat SDI inaction through two possible routes: increase the incentives to participation, or decrease the costs.

Randy raises a number of possible external incentives, such as legal mandates, to push public organizations into the SDI arena. In particular for areas where there is a strong argument for mandated participation (public safety) this approach may have legs. But we know how lower orders of governments love unfunded mandates.

I personally think that decreasing the costs has better potential in the short term, by examining the data sharing networks that have succeeded – the illegal music and movie distribution networks. Everyone has an incentive to take and no one has an incentive to give, yet the content is out there. There are technical approaches to enhancing sharing in sharing-averse communities which can be scavenged from this arena and brought into ours.

Even Better Technology

Rob Atkinson looks into the future and sees that the tools we have now are not equipped for doing effective data sharing.

What we need is the mechanism by which SDIs can grow (from top-down and bottom-up) to bridge that gap. Much like DNS provides domain roots and the bind protocol. What we need to do to realise SDI benefits is, as you say, enable massively scalable access to data updates by making life significantly easier at the grass roots level, but also by introducing a level of coherence to enable investment decisions at the national level.

I agree that many “real” data sharing applications are going to need some super-amazing technology to bind together content. Ontology and deep metadata. But in the meantime, looser, more human-mediated approaches are required to bridge the gap.

As Rob says, life needs to be easier for the grass roots. That is job one. Once the data is flowing, the coneheads in the central agencies can figure out techno-magic to stitch it all together, but until the data starts flowing the whole discussion is just so much intellectual masturbation.

I Propose…

That job one is to get the data flowing. There needs to be a single, user-facing application, a GeoNapster, that makes sharing data and finding data ludicrously simple. So simple that there is no excuse not to do it except sheer bullheadedness. Get the data flowing and then worry about how to integrate it.

Recognize that data at the lowest levels of government is created by one or two people. Pitch the tool and approach to that level. Make it search and find just as well as it shares. Integrate it with the desktop, even with the major vendor software, if that makes it work more easily.

The data sets that are “corporately” managed by state and federal bureaucracies may have to wait, or be brought online in the mode of NCOneMap, with careful one-on-one cajolling. But the SDI builders have to know what they want, what is of value, and be strategic, not shot-gun, in gathering those contributors.

Being strategic means making hard decisions about what will be used, and what is useful, given the current technology available. Imagery is widely useful with the current technology. Complex inventory data usually is not (would you like to see the forests by stand age? species? do the different jurisdictions use the same inventory techniques? are these apples or oranges or both?) so do not waste money or time on it.

Get out to the operational GIS level, meet the people who are going to use these services (in theory) and feed in new information (in theory) and figure out how to get involved in their day to day work. How can an SDI become as ingrained in the daily workflow of GIS technologies as Google is in our techno-lives?

Put the strategic diagrams, the boxes and arrows, in a drawer for a while. They will still be there later, when the time comes.