Tuesday, April 24, 2007

REST Feature Server

Piling on this meme! Pile!

Chris Holmes has taken a stab at some of the semantics of a REST feature server, and of course, Chris Schmidt has already written one. (!!!!!)

I would like to take issue with one of Chris Holmes' design points:
The results of a query can then just return those urls, which the client may already be caching

http://sigma.openplans.org/geoserver/major_roads?bbox=0,0,10,10

returns something like

<html>
<a href=’http://sigma.openplans.org/geoserver/major_roads/5′>5</a>
<a href=’http://sigma.openplans.org/geoserver/major_roads/1′>1</a>
<a href=’http://sigma.openplans.org/geoserver/major_roads/3′>3</a>
<a href=’http://sigma.openplans.org/geoserver/major_roads/8′>8</a>
</html>
Ouch! Resolving a query that return 100 features would require traversing 100 URLs to pull in the resources. What about if we include the features themselves in the response? Then we are potentially sending the client objects it already has. What is the solution?

It is already here! Tiling redux! Break the spatial plane up into squares, and assign features to every square they touch. You get nice big chunks of data, that are relatively stable, so they can be cached. Each feature can also be referenced singly by URL, just as Chris suggests, but the tiled access allows you to pull more than one at a time.

What about duplication? Some features will fall in more than one tile. What about it? Given the choice between pulling 1000 features individually, and removing a few edge duplicates as new tiles come in, I know what option I would choose. Because each feature in the tile includes the URL of the feature resource, it is easy to identify dupes and drop them as the tiles are loaded into the local cache.

Tiling on the brain, tiling on the brain...

8 comments:

cholmes said...

Still figuring out how I feel about this. What kinds of tiles do you propose to divide the world in to? I mean, if we have zoom levels like in the TMS spec, are we supposed to generalize at some levels? Or do we only get features if we're at the right zoom level? Or just get duplicates all the way down?

The alternative would perhaps be to divide the world up in to one grid, but then coarse datasets will get tons of duplicates, since they'll span many grids, or granular datasets will fill tiles with up to million of features.

And how do I return tiles in response to a specific query? Like num_lanes < 5? I calculate new tiles that just have those features in them?

I assume I'm missing some obvious concept here?

Paul Ramsey said...

The obvious concept is how to restfully communicate the tiling basis to the client, and the answer is, using a structure just like KML <Regions>. The root resource rectangle (full dataset bounding box) has children, which have children, and some of them have actual features in them. Wonderfully, the rectangles don't even have to be square, just hierarchically nested.

Do we get features at every level? No, just at the bottom level. How many levels we have is determined to some extent by practicality. We could just have two, a root, and below that all the data rectangles. But that might suffer from data volume problems.

Once we get to query, I think we're into another use case, and start to get all nervous, because I have yet to hear a discussion of a RESTful feature server with query that isn't very similar to a garden variety WFS.

Andrea Aime said...

Not all geometries will play well with a single unimform tiling.
I mean, as long as you're thinking parcels, or county boundaries, or points, it works.
When you have isolines, long streams, shorelines, and so on, that is, whatever is not cut in small pieces, it does not.

I've been thinking a little about this some time ago when I wrote a wiki page about datastore caching (http://geotools.codehaus.org/Datastore+caching),
and I'd say a solution is not to repeat, but have multiple tile levels, that is, a pyramid of tiles.

Each feature goes in the smallest tile that fully contains it. When you ask for data chunks, you ask not only for the base level, but for all higher levels of the pyramid too in the same area. Some will be empty, but some will give a nice place to store just once those huge (and hopefully rare) features that won't play nice with small tiles.

This is more complex, I know, but allows for better caching on the client side, since the higher levels of the pyramid will be asked for just once.

It's a way to trade more complexity in the protocol and have less complexity in the feature handling code (since you don't have to figure out which features have been duplicated anymore). Oh, and btw, this provides a natural and easy to use spatial index, too, which never hurts (in fact in that wiki page I was thinking about a spatial cache, not about REST).

Sean said...

Paul, I understand your discomfort about queries that return lists of URLs to be deferenced. It's a bit like one of your least favorite MapServer patterns (query for ids, then get feature by id) all over again. But it's how the web works. It's how KML network links work. KML regions help you avoid following links needlessly, and we'll certainly want to do something very much like that whenever possible.

Jeremy Cothran said...

Lately I've given up on the protocol/service based approaches in favor of document/report based ones which I think is more in line with REST. WMS is good and simple, but the rest of it (WFS, OGC SOS, etc) seems to lead to more redundant, complex middleware and discussions than is necessary if metadata/data was formatted/tiled/partitioned to be queried and downloaded efficiently based on the dominant use cases.

Paul Ramsey said...

I like Andrea's approach a lot. It solves the tile-crossing problem very nicely, and properly replicates an index structure.

Bill said...

I somehow lost my original comment. Here goes the shorter, angier version... my apologies for the tone.


Feature bundling != spatial indexing.

Don't build system that return results exclusively with all their spatial brethren. This would break attribute query performance.


Equal area vector data tiling != equal information density tiling.

Feature bundling sounds fine to me, but do it based on information density. Send 100kb or so feature bundles, with potentially multiple bundles per query.

Lawrence said...

... and don't forget that the world is not a plane. There are more than two dimensions folks ...

About Me

My Photo
Paul Ramsey
Victoria, British Columbia, Canada
View my complete profile

Blog Archive