Working in the Cathedral

01 Jul 2009

In February, at the Toronto Code Sprint, the PostGIS team looked each other in the eye (for the first time) and committed to get the 1.4 release out by late April.

Well, it’s late June now. It seems very likely that I will get to cut 1.4.0RC1 tomorrow morning.

My personal preference has always been to release early and often. In the hacker ethic, this sounds like a good thing, it’s the “bazaar” model that Eric Raymond promoted over the “cathedral” model of development. In the bazaar, you dump out regular releases, and let the community dictate whether they are of quality (“don’t use 2.31.2a, it’s garbage!”). I still remember being told by a more knowledgeable Linux user that I could upgrade to 1.1.53 (?), but not any further than that, because the succeeding releases were unstable. In the cathedral, you release no wine before its time, aiming for a polished diamond of a release.

So, 1.4.0 has taken much longer than expected, the confluence of a development team that is now unwilling to accept the existence of any “crasher” bugs at all (no matter how unlikely they are to be exercised) and a growing comprehensiveness in the test suite, which is now covering all the functions, in most every combination of inputs. Because of the enhanced testing, we discovered crashers we didn’t know we had – and then we had to fix them.

Despite chafing to release! release! release! I have come to appreciate our new conservatism. Among my favorite feedbacks on PostGIS is the users who say “it just works, install it and forget about it, rock solid”. That feels good, and to keep things that way, our new austerity is only going to help.

The maturation of PostGIS into a product you can just “install and forget” has been multi-stage.

Prior to the 1.0 release, Sandro Santilli added the first regression tests. These tests have been growing ever since and have been invaluable in ensuring that old bugs don’t re-enter the code base, and that new features don’t break old features.

For the 1.4 release, the documentation was upgraded substantially, by adding a great deal of extra structuring to the reference section. Regina Obe discovered that a side effect of the extra structure was that she could automatically generate a test for most every documented function using XSLT on the docbook XML. This new “garden test” found a number of previously undetected bugs, that have since been removed.

For the 1.4 release, I added the start of a CUnit test suite that exercises the PostGIS functions without requiring a database back-end. Even in it’s early state, it has saved me from a couple booboos already. For future releases, this extra regression suite is going to help keep things stable.

For the 1.4 release, Mark Cave-Ayland re-worked the logging and debugging infrastructure, to make the coding cleaner and easier to maintain during debugging cycles. He also split out the underlying geometry implementations, which are now used in the loader/dumper utilities, for a more consistent approach to geometry handling.

These are all under-the-covers improvements that end-users never see. But they all contribute to that “it just works, it just runs” end-user experience that I have come to treasure even more than the sensation of slamming out a point release at 2am. I hope everyone tries out RC1 so that we can slay any remaining bugs before the 1.4.0 release!

MySQL vs PostGIS

16 Jun 2009

Did I say I would publish my performance results? I did. Here they are.

MySQL Snark #2

12 Jun 2009

I am doing a little benchmarking as a learning experience with JMeter and I will publish the throughput numbers in a few days, after I run the full suite I have developed on the various combinations of concurrency and insert/select ratios.

Because MySQL has so few functions that actually do anything (see the note here) there’s not a great deal to test beyond raw performance. The early throughput results seem to indicate it’s comparable for simple CRUD on one table, but for anything non-trivial it falls down.

Here’s a basic spatial join: pull 23 roads from a 3.4M row line table and spatially join to a 66K row tract polygons table, calculating the sum of the areas of tract polygons found. There are spatial indexes on both tables.

mysql> select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t 
where 
  mbrintersects(r.geom, GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)')) 
and 
  mbrintersects(r.geom,t.geom);

+-------------------+
| sum(area(t.geom)) |
+-------------------+
|  1260394420.00453 | 
+-------------------+
1 row in set (9.43 sec)

And in PostGIS:

tiger=# select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t 
where r.geom && GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)',2163) 
and r.geom && t.geom;

       sum        
------------------
 1260394420.00684
(1 row)

Time: 5.574 ms</pre>

Those are both “hot cache” results, after running them a couple times each.

MySQL Snark

12 Jun 2009

OK, this one I have to share. Here’s two queries, the first with a syntax error in the WKT (oops!) and the second one correct.

First, as processed by MySQL:

mysql> select count(*) from tiger_roads_texas 
  where mbrintersects(geom, 
    GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342'));
+----------+
| count(*) |
+----------+
|        0 | 
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from tiger_roads_texas 
  where mbrintersects(geom, 
    GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342)'));
+----------+
| count(*) |
+----------+
|        1 | 
+----------+
1 row in set (0.06 sec)

Now as processed by PostGIS:

tiger=# select count(*) from tiger_roads_texas 
where geom && 
  GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342',2163);
ERROR:  parse error - invalid geometry
HINT:  "...RING(452284 -1651542, 452484 -1651342" <-- parse error at position 43 within geometry
CONTEXT:  SQL function "geomfromtext" statement 1

tiger=# select count(*) from tiger_roads_texas 
  where geom && 
    GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342)',2163);
 count 
-------
     1
(1 row)

Can you spot the difference? Snark! Another one for the list.

Wanted: OK Corral

10 Jun 2009

A WMS performance benchmark has been a staple of FOSS4G conferences for some time. In 2005, it was IMS vs Mapserver. In 2007 it was MapServer vs Geoserver. And in 2008, a grudge MapServer vs Geoserver re-match.

For 2009, we hope to continue the MapServer vs Geoserver tradition, and are inviting other WMS servers to join the fray. We are hoping to have ArcGIS Server in the mix, perhaps MapGuide, perhaps DeeGree. The participants are assembling on a benchmarking listserv.

However, right now we are stuck trying to find a location for our gunfight – we need an OK Corral. Our preferred corral would have the following characteristics:

One or more dual-core processors
4Gb of more of RAM
Centos or RHEL
Remote ssh access for participants
Root access or sudo for participants
Not virtualized

In addition, we will need a second server on the same network segment for generating load (would still need remote access, but would not need a beefy machine). Due to the nature of the participants (global) and the timelines (several months) we would need sole use of the corral until the testing is complete in September.

If you have a corral you can donate for the shoot-out, let me know!

Update: We have received a generous offer from the US Army Corps.

Older Newer

Paul Ramsey

Working in the Cathedral

MySQL vs PostGIS

MySQL Snark #2

MySQL Snark

Wanted: OK Corral