Monday, November 30, 2009

Is "Good Enough" Good Enough?

The gift that keeps on giving, for me, is performance work. Users love it, and it tickles my cerebral cortex in a way other work just cannot touch. Some of my first substantive patches to MapServer were performance patches, and I've also taken that tack into PostGIS.

One of the things that surprised me while investigating the MapServer QIX index structure was how little a badly built tree impacted real world performance. The QIX file, when built on point files, tends to build in far more depth than it needs, strictly speaking. The extra depth adds theoretical time to traverse the tree for searches and yet... even when I tested quite huge shape files the performance was fine.

Recently I've been looking again at internally indexing geometries for high performance testing of things like distance and intersections. We already have some of this for PostGIS, using the GEOS "PreparedGeometry" construction, but I'd like a native implementation, and something that doesn't necessarily depend on a cached implementation.

There is already a form of internally indexed testing in PostGIS, in the point-in-polygon case, and while reading it over I was struck by the elegance of the underlying assumption: rather than pre-sort the geometry edges, and then build the tree, the implementation simply scans the point array from start to finish and builds parent nodes from each successive pair of nodes. The result is not a perfect tree, by any stretch of the imagination, but... it's good enough. Because the edges are spatially auto-correlated, the parent nodes end up having good locality.

The beauty of this approach to index building is that since there is no sorting step and no re-balancing of the tree or any of that fancy stuff, you can actually build an "index" in O(n) time. A brute force intersection test is O(nm) time. But if you can build your indexes in O(n) time the cost of doing an intersection starts to get closer to O(n+m) time! (Note, I am not a computer scientist and the O() term for the actual tree-on-tree intersection test is beyond my powers.)

Now, since no sorting is being applied in the building of these "indexes" they could theoretically be terrible indexes. But since we're indexing GIS data, and the edges have this wonderful autocorrelation, they are actually "good enough", and obtainable in very little time.

The same tricks apply to building indexes for intersection and distance in geodetic space, which I predict will be in hot demand once people experience just how computationally expensive operations on the new PostGIS 1.5 geography type are!
 

Monday, November 23, 2009

You get more than what you pay for!

Entchev reflexively repeats one of the most foolish canards ever hurled against collaborative knowledge building (open source, open data, open knowledge)
You get what you pay for.
Was there ever a putdown more easily falsifiable?

You pay nothing for PostGIS. Is the value to you nothing? You pay nothing for Wikipedia. Is the value to you nothing? You pay nothing for Open Street Map, is the value to you... nothing? Clearly, you, and everyone, gets more than what they pay for.

Now, there are many arguments to be made and beers to be drunk over whether what you get is enough for all possible purposes of all possible people. Some think that the trend in open knowledge is so strong that even that goal is within reach. Me, not so much.

But clearly, everyone has access to these resources, for free, and what they get in return is more than what they paid.
 

Friday, November 13, 2009

The First Rule of Fight Club...

Who knew that ESRI was so hip? From James Fee's blog:
Note: If you post any specific 9.4 Beta information (such as quoting forums posts on the Beta forums), expect ESRI to personally contact you. They appear to be monitoring this blog post. You’ve been warned.

 

Palanterra X3

James Fee has a nice rant on mapping interfaces today, but the part the piqued my interest was embedded in the screen shot of the National Map. It said "The National Map Viewer BETA: Powered by Palanterra X3"

Interesting! Who or what is this Palanterra X3 who have managed to get a top-of-the-fold "Powered by" on a major USGS site? Oddly, the Google is unusually terse about this topic. Looks like it's an NGA project, and yet it's "Palanterra (TM)". Intra-governmental entrepreneurship at its finest, a carefully imagined brand name that appears only in meeting agendas and powerpoints.
 

Wednesday, November 11, 2009

FOSS4G hearts SoTM

A quiet bit of byplay is going on in the world of extreme geogeekery, as the State of the Map conference, the annual gathering of the OpenStreetMap community, decides where to host their 2010 event. The twist for this year is that an intrepid member of their spanish community has suggested hosting the event in Barcelona, to align with the FOSS4G 2010 event (either right before or right after). The OSM community decides on their site in early December – I'll add my non-binding "ooooooooh, I hopehopehopehopehope they choose Barcelona!"
 

Thursday, November 05, 2009

State of PostGIS

I gave a 30 minute talk about the past present and future of PostGIS at FOSS4G two weeks ago, and thanks to the efforts of the folks at FOSSLC it's now online as a video.
 

Wednesday, November 04, 2009

Geography and MapServer

Can you use the new PostGIS GEOGRAPHY type with MapServer? Yes! Just make sure your LAYER declares a geographic projection (e.g. "init=epsg:4326", or "proj=lonlat") so the correct coordinates are passed in. For simple DATA definitions(e.g. DATA "thegeog from thetable"), that's all you have to do. I haven't tested out more complex DATA statements yet, but I am pretty sure they should work fine.
 

Tuesday, November 03, 2009

PostGIS gets Spherical (Directors Cut)

Update: Installation instructions have changed slightly since this post was written.

For a business oriented discussion of the new GEOGRAPHY type, see my post on the OpenGeo blog.

So you want to try the new GEOGRAPHY type and see what it can do? Alright then!

If you are running Windows, please follow the directions on the Windows experimental binaries download page on the PostGIS site. Note that these builds might not be the absolute latest versions available.

If you are running Linux, fetch the latest code from SVN (svn checkout http://svn.osgeo.org/postgis/trunk postgis-svn) and then follow the install instructions.

After installing, you should find the usual postgis.sql file which contains the old geometry and now the new geography features too. Install PostGIS and spatial reference information as usual:

createdb mydb
psql -d mydb -f postgis.sql
psql -d mydb -f spatial_ref_sys.sql


I find it useful for testing to load a shape file in the usual way, then add a geography column to it.

shp2pgsql -s 26910 -g geom taxlots.shp taxlots | psql mydb
psql mydb


Once you have a table to play with, it's time to add a geography column, and build indexes. You have to transform the geometry into EPSG:4326 before casting to geography. When you build the index on the geography column, it builds a special index over the sphere which respects the poles and dateline.

alter table taxlots add column geog geography;
update taxlots set geog = geography(st_transform(tgeom,4326));
create index roads_geom_idx on taxlots using gist (geom);
create index roads_geog_idx on taxlots using gist (geog);


How does this magic index work? The "trick" is to be as lazy as possible. I didn't want to write a whole new indexing scheme, and I already had a serviceable R-Tree index. What I needed to do was convert the lat/lon coordinates to a domain where the problems of the singularities at the poles and dateline would go away. The answer is to convert from spherical coordinates (lat/lon) relative to Greenwich into geocentric coordinates (x/y/z) relative to the center of earth. It's easy then to build a 3D R-Tree on the geocentric bounds of the features. Calculating the bounds in 3D is tricky, because the curvature of the features over the spherical surface must be respected, but once that is done, the index performs admirably.

Now you can compare properties calculated on the plane and on the spheroid.

select st_area(geom) as geomarea, st_area(geog) as geogarea from taxlots limit 10;

Note that the units returned by the geography functions are metric. Square meters for area and linear meters for distances and lengths.

How about a quick spatial self-join, to test the indexes?

\timing
-- geography test
select sum(st_area(geog)) from taxlots a, taxlots b where st_dwithin(a.geog, b.geog, 100.0) and b.gid = 1;
-- geometry test
select sum(st_area(geom)) from taxlots a, taxlots b where st_dwithin(a.geom, b.geom, 100.0) and b.gid = 1;


In testing, I have been finding the geography index slightly faster than the geometry index, which is perhaps because I was able to write the geography index binding from scratch, trying to apply the lessons I have taken from reviewing the existing geometry index. In PostGIS 1.5 (coming Christmastime or sooner) the geography and geometry types will coexist, but use different disk serializations and index bindings. In PostGIS 2.0 (fall 2010) the geometry will also be swapped over to a new serialization and index binding and should become as fast (faster) than the geography index.

Finally, try out the transparent integration of the ST_Buffer() function from geometry with the geography type.

select st_area(st_buffer(geog, 2.0)) from taxlots limit 1;

By carefully transforming geometries out to appropriate planar spatial reference systems, you can use the functions built for geometry to provide operations on geographies. The definition for ST_Buffer(geography, double) looks like this:

CREATE OR REPLACE FUNCTION ST_Buffer(geography, float8)
RETURNS geography
AS 'SELECT geography(ST_Transform(ST_Buffer(ST_Transform(geometry($1), _ST_BestSRID($1)), $2), 4326))'
LANGUAGE 'SQL' IMMUTABLE STRICT;


Note that ST_Transform() appears twice, once to transform from geographics to a planar system and again to transform back after the buffer operation is complete. Also note the _ST_BestSRID() function, which analyzes the geography and provides a best guess planar system suitable for carrying out the operation. Right now the system picks an appropriate UTM zone, or a polar stereographic, or falls back to a mercator if there is no good choice.

Acknowledgements: Of course, top billing goes to the funder, who has chosen to remain anonymous. Also, it would have been impossible for me to build the ST_Area() or ST_Distance() functions on the sphere and spheroid without the contributions of David Skea (if you check the PostGIS source code, you'll find he has contributed mathematical magic in the past also). And finally, Regina Obe who has been testing and documenting my work as it is committed, resulting in very effective on-the-spot debugging and fixing of issues as they occur. Thanks everyone!
 

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels