PostGIS gets Spherical (Directors Cut)

Update: Installation instructions have changed slightly since this post was written.

For a business oriented discussion of the new GEOGRAPHY type, see my post on the OpenGeo blog.

So you want to try the new GEOGRAPHY type and see what it can do? Alright then!

If you are running Windows, please follow the directions on the Windows experimental binaries download page on the PostGIS site. Note that these builds might not be the absolute latest versions available.

If you are running Linux, fetch the latest code from SVN (svn checkout http://svn.osgeo.org/postgis/trunk postgis-svn) and then follow the install instructions.

After installing, you should find the usual postgis.sql file which contains the old geometry and now the new geography features too. Install PostGIS and spatial reference information as usual:

createdb mydb
psql -d mydb -f postgis.sql
psql -d mydb -f spatial_ref_sys.sql

I find it useful for testing to load a shape file in the usual way, then add a geography column to it.

shp2pgsql -s 26910 -g geom taxlots.shp taxlots | psql mydb
psql mydb

Once you have a table to play with, it’s time to add a geography column, and build indexes. You have to transform the geometry into EPSG:4326 before casting to geography. When you build the index on the geography column, it builds a special index over the sphere which respects the poles and dateline.

alter table taxlots add column geog geography;
update taxlots set geog = geography(st_transform(tgeom,4326));
create index roads_geom_idx on taxlots using gist (geom);
create index roads_geog_idx on taxlots using gist (geog);

How does this magic index work? The “trick” is to be as lazy as possible. I didn’t want to write a whole new indexing scheme, and I already had a serviceable R-Tree index. What I needed to do was convert the lat/lon coordinates to a domain where the problems of the singularities at the poles and dateline would go away. The answer is to convert from spherical coordinates (lat/lon) relative to Greenwich into geocentric coordinates (x/y/z) relative to the center of earth. It’s easy then to build a 3D R-Tree on the geocentric bounds of the features. Calculating the bounds in 3D is tricky, because the curvature of the features over the spherical surface must be respected, but once that is done, the index performs admirably.

Now you can compare properties calculated on the plane and on the spheroid.

select 
  st_area(geom) as geomarea, 
  st_area(geog) as geogarea 
from taxlots limit 10;

Note that the units returned by the geography functions are metric. Square meters for area and linear meters for distances and lengths.

How about a quick spatial self-join, to test the indexes?

\timing

-- geography test
select sum(st_area(geog)) 
from taxlots a, taxlots b 
where st_dwithin(a.geog, b.geog, 100.0) and b.gid = 1;

-- geometry test
select sum(st_area(geom)) 
from taxlots a, taxlots b 
where st_dwithin(a.geom, b.geom, 100.0) and b.gid = 1;

In testing, I have been finding the geography index slightly faster than the geometry index, which is perhaps because I was able to write the geography index binding from scratch, trying to apply the lessons I have taken from reviewing the existing geometry index. In PostGIS 1.5 (coming Christmastime or sooner) the geography and geometry types will coexist, but use different disk serializations and index bindings. In PostGIS 2.0 (fall 2010) the geometry will also be swapped over to a new serialization and index binding and should become as fast (faster) than the geography index.

Finally, try out the transparent integration of the ST_Buffer() function from geometry with the geography type.

select st_area(st_buffer(geog, 2.0)) from taxlots limit 1;

By carefully transforming geometries out to appropriate planar spatial reference systems, you can use the functions built for geometry to provide operations on geographies. The definition for ST_Buffer(geography, double) looks like this:

CREATE OR REPLACE FUNCTION ST_Buffer(geography, float8)
  RETURNS geography
  AS '
    SELECT 
      geography(
        ST_Transform(
          ST_Buffer(
            ST_Transform(geometry($1), _ST_BestSRID($1)), 
            $2
          ), 
          4326
        )
      )'
  LANGUAGE 'SQL' IMMUTABLE STRICT;

Note that ST_Transform() appears twice, once to transform from geographics to a planar system and again to transform back after the buffer operation is complete. Also note the _ST_BestSRID() function, which analyzes the geography and provides a best guess planar system suitable for carrying out the operation. Right now the system picks an appropriate UTM zone, or a polar stereographic, or falls back to a mercator if there is no good choice.

Acknowledgements: Of course, top billing goes to the funder, who has chosen to remain anonymous. Also, it would have been impossible for me to build the ST_Area() or ST_Distance() functions on the sphere and spheroid without the contributions of David Skea (if you check the PostGIS source code, you’ll find he has contributed mathematical magic in the past also). And finally, Regina Obe who has been testing and documenting my work as it is committed, resulting in very effective on-the-spot debugging and fixing of issues as they occur. Thanks everyone!

Googalytics

If you haven’t read Paul Bissett’s take on the Goog’s entrance into the data collection and LBS businesses, you must, he’s nailing it. Start with “GOOG = BORG, and I Mean That in the Nicest Way”, and then do “Google Will Open Source National Parcel Map”.

Climate @ FOSS4G 2009

I found the keynote by climate scientist Andy Pitman at FOSS4G very interesting, on a couple levels.

Firstly on the climate level, he started off with a graph of some of the predictions from 1990, when it was becoming clear that climate change was something that policy makers needed to Do Something About. At that time, the scientists put together a range of scenarios, places we would be based on different policy responses, ranging from radical changes in energy production and efficiency to doing SFA. And amazingly, we have managed to do worse than the worst-case scenario envisioned in 1990. With the polar ice on the way out, and glacial (hah hah) progress in getting policy change China and the USA (the only two that matter, if they lead the world will follow), it was very sobering to see just how poorly we are doing at averting the very predictable end of our civilization.

Secondly on the open source level, he was very clear that open source technology (C, Java, etc) and open source programmers (knowing nothing about climate science) probably couldn’t be directly useful to the modelling effort. But he did think that the modellers should aspire to a more open source ethic in their science. Most of the models are closed source, and there is little of the open source ethic of communications going on. My main concern about an open source climate model is that the politicization of the whole discussion would destroy any efforts at community building. One of the things projects have to occasionally deal with is “poisonous people”, and an open climate model would have to deal with orders of magnitude more of those. It’s draining and distracting from the real business at hand.

I hope the climate modellers can figure out how to square the circle of more openness without poisonous people, but in my mind modelling is icing on the cake at this point. We know we need to de-carbonize, it’s obvious that changing the chemical balance of our environment, whether locally with things like PCBs or DDT or globally with CO2 is a dangerous crapshoot. At this point, we need the force of will to do it.

(BTW, I think we should be taxing carbon at the source, not taxing emissions (too many tailpipes) but extraction. It’s not carbon per se that is the problem, it’s extra carbon, the kind we dig and pump out of the ground, that is baking us. Fewer points of taxation and control, easier to enforce and manage, let’s tax oil and coal and natural gas at the point of production (or, if exporting jurisdictions refuse to do so, at our borders.)

FOSS4G 2009 Keynote

Lots of people here have been asking if my keynote is going to be posted online. So, here it is (31MB). Be forewarned, it’s a big file. There is also a handheld video available on YouTube. There was a professional video shot too, and hopefully that will go online at the conference site in the next month or so.

Slashdot Explains Sexism in Open Source

In the spirit of beer commercials and sports highlight shows, Slashdot’s editors blithely follow a post about sexism in open source with, just 45 minutes later, a post on a Marge Simpson spread in Playboy.

Slashdot Effect

Let no one say Slashdot does not know its audience. At least, the audience they have, not the one they’re missing out on.