Friday, October 31, 2008

Mapserver/PostGIS Performance Tips

I'm working on re-writing the PostGIS driver in Mapserver to clean it up a little and hopefully make it faster, and seeing the flow of control, there are a couple ways users of the existing driver can improve performance with small configuration changes. The simplest syntax for defining a PostGIS layer in Mapserver is just:

DATA "the_geom from the_table"

Very simple, but: how does Mapserver know what primary key to use in queries? And what SRID to use when creating the bounding box selection for drawing maps? The answer is, it asks the database for that information. With two extra queries. Every time it processes the layer.

However, if you are explicit about your unique key and SRID in configuration, Mapserver can, and does, skip querying the back-end for that information.

DATA "the_geom from the_table using unique gid using srid=4326"

Also, if you have more than one PostGIS layer in your map file, you should turn on the Mapserver connection pool, even if you're not running in FastCGI mode. That's because the pool will allow all the layers to reuse the same connection. If you have have seven PostGIS layers, at 15ms per connection, that's 90ms saved (you still pay 15ms for the first connection).

Add this line at the end of each PostGIS layer to tell Mapserver to leave the connection open for future layers:

PROCESSING "CLOSE_CONNECTION=DEFER"

Go fast, fast, fast!
 

Wednesday, October 29, 2008

New Regime

Today my wife went back to work, which means the "new regime" is in effect!

I admit, I've been taking it pretty easy the last 18 months, I haven't cooked a lot of breakfasts or meals. Really, I've been a bum.

But mornings now, my wife has to get ready for work, which means I have to feed and water the kids (toss some slops in the bin, hose them down afterwards, you know the drill) and get them ready to be carted off to daycare. Wake up and get that game face on! It's really very nice, and reminds me of my year at home taking care of my daughter when she was one – dealing with the little details of kid life is very gratifying (in moderate doses).

And the pay-off? Once they head out the house descends into silence, blessed silence. Super-productive morning today.
 

Wednesday, October 22, 2008

Picking up the gauntlet

Mike Pumphrey over at the Geoserver blog has written a short post about this year's Geoserver-vs-Mapserver comparison. I hope we can maintain this study as an annual event, and even get someone with an ArcServer license to join in the fun. Each iteration finds new areas that need work and resets the bar better and better every year.

Basically, there are some differences that are small, and ignorable, and there are some differences that are really anomalous. And the end of the day, both systems are doing the same thing, so order-of-magnitude performance differences are cries for help.

I've been focussing on the Mapserver side. Last year, the study by Brock and Justin found an odd quirk where Mapserver got progressively worse at shape file rendering as the shape files got bigger. I found the issue and fixed it this spring, and (w00t!) Mapserver won the shape file race this year.

But... this year found that the PostGIS performance in Mapserver was (while fast) about half as fast as Geoserver. Hmmmm. So I know what I'll be working on this month. I have some guesses, but they will need to be tested.

Andrea added some aesthetic tests this year, and brought them to the attention of the Mapserver team, and as a result the next release of Mapserver will include more attractive labeling results and line width control.

Any development team that's willing to swallow their pride (because for every test you win, there's one you'll lose) can get a lot of benefit in joining in this benchmarking exercise.
 

Friday, October 17, 2008

Keep your friends close...

And your enemies closer. It seems ESRI has yet to learn that particular piece of the wisdom of Sun-tzu, and that's too bad. By excluding "competitors" that are very small compared to the overall marketplace, ESRI is being penny-wise and pound foolish. Sure, open source will steal a few accounts here and there, but the real prize is to co-opt them into your ecosystem, where you can keep an eye on them, a lesson Microsoft has clearly learned.
 

Thursday, October 16, 2008

uDig 1.1.0

I'm a cowboy. I like to just slap a brand on the cattle and push them out the gate. Sometimes this gets me in trouble.

Jesse Eichar, the uDig project lead, is not a cowboy. The 1.1.0 release comes after a series of 14 (fourteen) "RC" versions and three "SC" versions. Congratulations to Jesse, and to Jody and Andrea and other uDig team members, on "going gold" with the 1.1 release. Remember, if things aren't perfect, there's always 1.1.1!

One thing watching the uDig development process has taught me over the years is how much harder user-facing applications are than server-side ones. The number of places you can "get it wrong" is orders of magnitude greater. The number of ways you can fine tune and fine tune and fine tune a particular piece of interaction is almost infinite (the editing tools are something like major revision four since the project started, and I'm sure there will still be things to be changed and fiddled with, given the hyper-modality and hyper-interactivity of editing). It has given me a lot more respect for the people writing web browsers and word processors and all the other virtual tools that we use every day. And now I automatically quadruple estimates that involve user interfaces, instead of merely doubling them as I used to.

Update: A timely review of uDig, posted at the Linux Journal.
 

Bailout? What Bailout?

The "cost of the bailout" has been a big election meme south of the border, and I continue to be flabbergasted at how primitive the media discussion of the issue has been. The first debate event began with a question that essentially said "given the $700B cost of the bailout, what parts of your campaign platform would you cut to pay for it?".

How about: none of it. How about: tell you what, I'm going to spend more.

The US of A is going to sell $700B worth of Treasury Bills (bonds) to various countries, institutions and people – China, Saudi princes, sovereign wealth funds, foreign banks, and so on. For short, we'll call them "the Suckers". These bonds are going to pay the Suckers some absurdly low interest rate, like 2% or less.

The US of A is then going to turn around and exchange the $700B it got from the Suckers for preferred shares in the banks, which will pay 5% for the first five years, and 10% after that.

So, rather than costing the benighted tax payers of the USA anything, this "bailout" is actually going to be netting the Treasury $700B * 3% = $21B a year. The only people with anything to complain about might be the Suckers, and it's not like anyone is forcing them to buy Treasury Bills.

Why is this rather elementary fact not finding its way into the political discussion of the "bailout"? Too much math?
 

Tuesday, October 14, 2008

Sponsor GEOS, Make PostGIS Faster

Martin Davis just posted about his improvements to the JTS buffering routines, speeding up buffering by a mere factor of 20 or so.

Martin has also added some improvements in the area of unions for large sets of geometries, a technique he calls "cascaded union". It too is good for orders-of-magnitude performance improvements.

Do you have PostGIS queries of this form:

SELECT [...], ST_Buffer(the_geom,1000) FROM [...] WHERE [...]

or

SELECT ST_Union(the_geom) FROM mytable WHERE [...] GROUP BY [...]

If you do, then getting Martin's JTS algorithms ported to GEOS (the C++ geometry library used by PostGIS) will make your database run faster. Lots faster.

How can you help that happen? Become an OSGeo “Project Sponsor” for GEOS. Project sponsor commit a modest sum to the ongoing maintenance of the code, which is generally used for hiring a maintainer to do things like ensure patches are properly integrated, that tests are added for reliability, and that upgrades like the ones Martin has created get folded into the code base in a timely manner.

If you're interested in sponsoring GEOS development, please get in touch with me. If you are using PostGIS in your business, it is money well spent.
 

Voting Day in Canada

Happy voting day, Canadians. I'm just about to head out my local polling place to nullify somebody's vote by casting an equal, opposite vote (I like to identify them in line, for maximum enragement. "Oh, you're voting Conservative? I'm voting NDP, so you might as well have not even bothered coming.")

I wouldn't be a Canadian if I didn't take this opportunity to point out how this day demonstrates (yet again) how superior we are to our cousins to the south. We started our election later than them (September 9), but we're still finishing first. I will not have to spend three hours in line to vote. I will vote using a pencil and paper, not a touch screen doohickey. We have four national parties to choose from (five, if you're from Quebec and think Quebec is a nation) instead of two. We also have lots of other great options on the ballot, for those who like to get off the beaten path (see right). And we don't just have hockey moms running for office, but honest-to-god hockey players.
 

Friday, October 10, 2008

Credit Crisis and Trade

I found this item today, particularly chilling:
The credit crisis is spilling over into the grain industry as international buyers find themselves unable to come up with payment, forcing sellers to shoulder often substantial losses.

Before cargoes can be loaded at port, buyers typically must produce proof they are good for the money. But more deals are falling through as sellers decide they don't trust the financial institution named in the buyer's letter of credit, analysts said.
I think everyone should take a deep breath and go read Roosevelt's first inaugural, both for a perspective on how much worse things can get, and the mindset needed to address these things.
So, first of all, let me assert my firm belief that the only thing we have to fear is fear itself--nameless, unreasoning, unjustified terror which paralyzes needed efforts to convert retreat into advance. In every dark hour of our national life a leadership of frankness and vigor has met with that understanding and support of the people themselves which is essential to victory. I am convinced that you will again give that support to leadership in these critical days.

In such a spirit on my part and on yours we face our common difficulties. They concern, thank God, only material things. Values have shrunken to fantastic levels; taxes have risen; our ability to pay has fallen; government of all kinds is faced by serious curtailment of income; the means of exchange are frozen in the currents of trade; the withered leaves of industrial enterprise lie on every side; farmers find no markets for their produce; the savings of many years in thousands of families are gone.
And it goes on... For something delivered 75 years ago, it feels surprisingly topical.
 

PostGIS Performance: Prepared Geometry

Spatial joins are a common use case in spatial databases, putting together two tables based on the spatial relationships of their geometry fields. For example:

SELECT census.income, houses.value FROM census JOIN houses ON (ST_Contains(census.geom, houses.geom))

The way this gets evaluated by the database looks something like this:

for each census polygon
  for each house point near the census geom
    run the st_contains test on that pair of geometries


Because the outer loop is driven by the census geometry, you will get repeated calls to the "contains" algorithm that have the same polygon each time. By recognizing this repetition, you can build a shortcut, that creates a smart, indexed polygon, and uses it over and over each time it is repeated in a function call.

The "smart, indexed polygon" is a "PreparedGeometry", and the concept was implemented in JTS over a year ago. About six months ago, it was ported to GEOS (a C++ mirror of JTS), but the port still had some nagging memory leaks which made it unready for production use.

Last month, Zonar Systems, who funded the initial JTS algorithm work, asked me to bring the functionality the rest of the way out to PostGIS. We found a C++ expert who identified and removed the last GEOS leaks, and I cleaned the leaks out of the PostGIS side of the implementation.

The speed difference is impressive!

I have a test data set of two tables: one table of 80 large polygons, and another table of 8000 small polygons. Each large polygon contains about 100 small ones.

Without the prepared geometry, a spatial join using ST_Intersects takes about 40 seconds. With the prepared geometry, the join takes 8 seconds, five times faster. The larger the size difference between your tables, the larger the speed-up you see will be.

The functions effected by the PreparedGeometry upgrade are ST_Intersects(), ST_Contains(), ST_Covers() and ST_ContainsProperly().

To try out the new functionality, you'll need to check out and compile the GEOS SVN trunk (http://svn.osgeo.org/geos/trunk) which will become GEOS 3.1.0 in a little while, and the PostGIS 1.3 SVN branch (http://svn.refractions.net/postgis/branches/1.3), which will become PostGIS 1.3.4 shortly. First compile and install GEOS, then PostGIS, since PostGIS checks the GEOS version during the compile stage to determine whether to activate the functionality.

Major thank you to Zonar Systems for funding the initial work and then stepping up a second time to fund the clean-up and roll-out to production-ready status. Why did they do it? They run a major fleet tracking and data analysis system on PostGIS, and they need lots of speed to handle the huge data volumes generated by their real-time tracking devices.
 

Wednesday, October 08, 2008

Rotten Afternoon

Anybody want a set of wisdom teeth? I've got a couple that I won't be using anymore...
 

Thursday, October 02, 2008

Sol Katz Award

To my astonishment, I received the Sol Katz Award for 2008 today. For the record, here is the acceptance speech I gave via video to the FOSS4G closing plenary session:
This is a big honour for me, to be in the company of people like Frank Warmerdam, Steve Lime and Markus Neteler as a Sol Katz recipient.

Those guys built core pieces of open source software with their bare hands, from the ground up, and that alone marks them out as special, but they also helped build their communities, and that's a big part of their contribution, too.

In my case, community building is almost my only contribution.

I have added some very small amounts of code to PostGIS and uDig over the years, but until only a few months ago my main contribution was community building, by finding the funding or staff time to develop the projects, providing some design guidelines, and by working on the mailing lists to help people with problems.

So I want to start out by thanking a few of the people who did much of the actual work on the projects I have been identified with over the years, the PostGIS spatial database and the uDig desktop application.

Dave Blasby, a brilliant programmer, who wrote the first versions of PostGIS when he was at Refractions, and who taught me by osmosis many of the technical fundamentals I exercise to this day.

Sandro Santilli, who was so impressive as a volunteer contributor to PostGIS that I hired him sight-unseen to maintain PostGIS, which he did from his home in Rome, for a number of years.

Mark Cave-Ayland, who is still involved in PostGIS, and the "go to" guy when the problems get really hard.

Jody Garnett and Jesse Eichar, who took the uDig project from a sketch in a funding proposal to a working application, and have continued to nurture and improve it up to this day.

Those are just a very few of the people who have contributed to making the PostGIS and uDig projects successful, there are so many more, and I thank them all. Thank you so much!

I hope, that my receiving this award will inspire other non-technical members of the open source community. Open source is collaborative in all kinds of ways, not only do we share code, but we share effort, and money. For a manager, contributing money or staff time to open source is often a karmic investment – the return is impossible to foresee, and yet, in my experience, there always seems to be a return in the end, you are repaid for your investment many times in many ways, most of which you don't expect.

I can't accept an award honoring my investments, in time and money, to open source, without also honoring two men who are largely unknown in the open source GIS community.

Graeme Leeming and Philip Kayal were my business partners for ten years at Refractions Research, during the time we developed PostGIS and uDig, and without their willingness to invest in my crazy schemes, we would never have achieved what we did.

Their willingness to join in my enthusiasms and get off the beaten path of consulting was critical to making the projects successful, and all of us took risks together to make the projects great. So thanks Graeme and Phil, and also all the folks at Refractions.

I hope you have all had a great FOSS4G, I am sorry I could not be with you in person this year, but I'm looking forward to raising a pint with you all in Sydney, Australia next year.
 

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels