PostGIS Code Re-org

Most commit messages are pretty terse affairs (“Fix for issue #142.”, “Remove compile warnings.”) but this morning, Mark Cave-Ayland posted this novel along with his code changes:

r3224 /trunk/loader/ ( pgsql2shp.c wkb.h)

Switch pgsql2shp over to using liblwgeom.

There are few commits that can be as satisfying as one which involves the removal of ~1200 lines of code. By using the liblwgeom parser instead of the in-built parser, we have now achieved the following:

i) all parsers within PostGIS, shp2pgsql and pgsql2shp are now the same which means they all follow the same rules. Also extended error reporting information including error text and position information is available.

ii) the complexity of the shp2pgsql/pgsql2shp is considerably reduced.

The slightly unfortunate cost is the overall executable size is larger, since we are linking with liblwgeom. However, from both a consistency and maintainability point of view, this is a big win. Note that while there may be a difference in behaviour in some corner cases, all regression tests pass here.

Some extra explanation is in order. Mark’s goal for the 1.4 release of PostGIS is to clean up the underlying code and make it easier to develop on. That has resulted in some major reorganization under the covers.

The geometry framework (liblwgeom) under PostGIS was written to be separable from the rest of the specific PgSQL code, but historically was managed right alongside it, in the same directory and built chain. Mark has broken it out into it’s own directory with a separate build out to a true library file.

Now that the geometry framework is a true library, it can be used in other places, not just the back-end. So hooking the data loader/dumpers into it is a first step, and as he notes, improves the code immensely.

Mapserver/PostGIS Performance Tips

I’m working on re-writing the PostGIS driver in Mapserver to clean it up a little and hopefully make it faster, and seeing the flow of control, there are a couple ways users of the existing driver can improve performance with small configuration changes. The simplest syntax for defining a PostGIS layer in Mapserver is just:

DATA "the_geom from the_table"

Very simple, but: how does Mapserver know what primary key to use in queries? And what SRID to use when creating the bounding box selection for drawing maps? The answer is, it asks the database for that information. With two extra queries. Every time it processes the layer.

However, if you are explicit about your unique key and SRID in configuration, Mapserver can, and does, skip querying the back-end for that information.

DATA "the_geom from the_table using unique gid using srid=4326"

Also, if you have more than one PostGIS layer in your map file, you should turn on the Mapserver connection pool, even if you’re not running in FastCGI mode. That’s because the pool will allow all the layers to reuse the same connection. If you have have seven PostGIS layers, at 15ms per connection, that’s 90ms saved (you still pay 15ms for the first connection).

Add this line at the end of each PostGIS layer to tell Mapserver to leave the connection open for future layers:


Go fast, fast, fast!

New Regime

Today my wife went back to work, which means the “new regime” is in effect!

I admit, I’ve been taking it pretty easy the last 18 months, I haven’t cooked a lot of breakfasts or meals. Really, I’ve been a bum.

But mornings now, my wife has to get ready for work, which means I have to feed and water the kids (toss some slops in the bin, hose them down afterwards, you know the drill) and get them ready to be carted off to daycare. Wake up and get that game face on! It’s really very nice, and reminds me of my year at home taking care of my daughter when she was one – dealing with the little details of kid life is very gratifying (in moderate doses).

And the pay-off? Once they head out the house descends into silence, blessed silence. Super-productive morning today.

Picking up the gauntlet

Mike Pumphrey over at the Geoserver blog has written a short post about this year’s Geoserver-vs-Mapserver comparison. I hope we can maintain this study as an annual event, and even get someone with an ArcServer license to join in the fun. Each iteration finds new areas that need work and resets the bar better and better every year.

Basically, there are some differences that are small, and ignorable, and there are some differences that are really anomalous. And the end of the day, both systems are doing the same thing, so order-of-magnitude performance differences are cries for help.

I’ve been focussing on the Mapserver side. Last year, the study by Brock and Justin found an odd quirk where Mapserver got progressively worse at shape file rendering as the shape files got bigger. I found the issue and fixed it this spring, and (w00t!) Mapserver won the shape file race this year.

But… this year found that the PostGIS performance in Mapserver was (while fast) about half as fast as Geoserver. Hmmmm. So I know what I’ll be working on this month. I have some guesses, but they will need to be tested.

Andrea added some aesthetic tests this year, and brought them to the attention of the Mapserver team, and as a result the next release of Mapserver will include more attractive labeling results and line width control.

Any development team that’s willing to swallow their pride (because for every test you win, there’s one you’ll lose) can get a lot of benefit in joining in this benchmarking exercise.

Keep your friends close...

And your enemies closer. It seems ESRI has yet to learn that particular piece of the wisdom of Sun-tzu, and that’s too bad. By excluding “competitors” that are very small compared to the overall marketplace, ESRI is being penny-wise and pound foolish. Sure, open source will steal a few accounts here and there, but the real prize is to co-opt them into your ecosystem, where you can keep an eye on them, a lesson Microsoft has clearly learned.