Monday, July 06, 2009

FOSS4G 2009

I'm going! Are you?

No One Ever got Fired for Buying Linux

For a while there, Microsoft made a lot of hay about the London Stock Exchange using Windows in their trading system.



As it turns out, too much hay. The LSE is now going to abandon their Windows trading system. As the author points out, IT failures aren't all that rare, what is rare is learning about them. Usually the principals bury the body and move on to "Phase II". In this case the principal was fired, and her replacement is hanging out the dirty laundry.

Another thing that is rare is for a dominant vendor to shoulder any blame for these kinds of failures. The usual principle is that, if everyone is doing it, it can't possibly be stupid.

Did you buy an expensive web mapping server and then have to put it on a nightly re-boot cycle to avoid service degradation? Don't worry, everyone is doing it, it doesn't reflect badly on you.

Is all your e-mail locked in binary file archives, where a small corruption can render the entire archive irretrievable? Don't worry, everyone is doing it, it doesn't reflect badly on you.

It's not an IT thing, really, it's called "culture", our common shared beliefs and idiosyncrasies.

Did you start your day by repeatedly accelerating and decelerating a 4000lb metal box holding only yourself and a cup of coffee over a hot tar field, place your box in another hot tar field, and then hike over the tar field to a large glass box enter, and place yourself inside a further fabric covered box? Don't worry, everyone is doing it, it doesn't reflect badly on you.
 

Thursday, July 02, 2009

Lies, Damn Lies...

"Green shoots..." ah, for the good old days of only two weeks ago, when green shoots were in our future...

Job Losses

I never really understood why decreases in the rate of change of unemployment were considered such great news. "Good news, the second derivative has gone positive! we're plunging into the abyss slightly less quickly!" Only in a world of rampant, congenital optimism – or statistics-induced myopia – could four months in which 18,300 Americans lost their jobs every day be described as a period of "improving conditions".
 

Wednesday, July 01, 2009

Working in the Cathedral

In February, at the Toronto Code Sprint, the PostGIS team looked each other in the eye (for the first time) and committed to get the 1.4 release out by late April.

Well, it's late June now. It seems very likely that I will get to cut 1.4.0RC1 tomorrow morning.

My personal preference has always been to release early and often. In the hacker ethic, this sounds like a good thing, it's the "bazaar" model that Eric Raymond promoted over the "cathedral" model of development. In the bazaar, you dump out regular releases, and let the community dictate whether they are of quality ("don't use 2.31.2a, it's garbage!"). I still remember being told by a more knowledgeable Linux user that I could upgrade to 1.1.53 (?), but not any further than that, because the succeeding releases were unstable. In the cathedral, you release no wine before its time, aiming for a polished diamond of a release.

So, 1.4.0 has taken much longer than expected, the confluence of a development team that is now unwilling to accept the existence of any "crasher" bugs at all (no matter how unlikely they are to be exercised) and a growing comprehensiveness in the test suite, which is now covering all the functions, in most every combination of inputs. Because of the enhanced testing, we discovered crashers we didn't know we had – and then we had to fix them.

Despite chafing to release! release! release! I have come to appreciate our new conservatism. Among my favorite feedbacks on PostGIS is the users who say "it just works, install it and forget about it, rock solid". That feels good, and to keep things that way, our new austerity is only going to help.

The maturation of PostGIS into a product you can just "install and forget" has been multi-stage.

Prior to the 1.0 release, Sandro Santilli added the first regression tests. These tests have been growing ever since and have been invaluable in ensuring that old bugs don't re-enter the code base, and that new features don't break old features.

For the 1.4 release, the documentation was upgraded substantially, by adding a great deal of extra structuring to the reference section. Regina Obe discovered that a side effect of the extra structure was that she could automatically generate a test for most every documented function using XSLT on the docbook XML. This new "garden test" found a number of previously undetected bugs, that have since been removed.

For the 1.4 release, I added the start of a CUnit test suite that exercises the PostGIS functions without requiring a database back-end. Even in it's early state, it has saved me from a couple booboos already. For future releases, this extra regression suite is going to help keep things stable.

For the 1.4 release, Mark Cave-Ayland re-worked the logging and debugging infrastructure, to make the coding cleaner and easier to maintain during debugging cycles. He also split out the underlying geometry implementations, which are now used in the loader/dumper utilities, for a more consistent approach to geometry handling.

These are all under-the-covers improvements that end-users never see. But they all contribute to that "it just works, it just runs" end-user experience that I have come to treasure even more than the sensation of slamming out a point release at 2am. I hope everyone tries out RC1 so that we can slay any remaining bugs before the 1.4.0 release!
 

Tuesday, June 16, 2009

MySQL vs PostGIS

Did I say I would publish my performance results? I did. Here they are.
 

Friday, June 12, 2009

MySQL Snark #2

I am doing a little benchmarking as a learning experience with JMeter and I will publish the throughput numbers in a few days, after I run the full suite I have developed on the various combinations of concurrency and insert/select ratios.

Because MySQL has so few functions that actually do anything (see the note here) there's not a great deal to test beyond raw performance. The early throughput results seem to indicate it's comparable for simple CRUD on one table, but for anything non-trivial it falls down.

Here's a basic spatial join: pull 23 roads from a 3.4M row line table and spatially join to a 66K row tract polygons table, calculating the sum of the areas of tract polygons found. There are spatial indexes on both tables.

mysql> select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t
where
mbrintersects(r.geom, GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)'))
and
mbrintersects(r.geom,t.geom);

+-------------------+
| sum(area(t.geom)) |
+-------------------+
| 1260394420.00453 |
+-------------------+
1 row in set (9.43 sec)


And in PostGIS:

tiger=# select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t
where r.geom && GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)',2163)
and r.geom && t.geom;

sum
------------------
1260394420.00684
(1 row)

Time: 5.574 ms


Those are both "hot cache" results, after running them a couple times each.
 

MySQL Snark

OK, this one I have to share. Here's two queries, the first with a syntax error in the WKT (oops!) and the second one correct.

First, as processed by MySQL:

mysql> select count(*) from tiger_roads_texas 
where mbrintersects(geom,
GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342'));
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from tiger_roads_texas
where mbrintersects(geom,
GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342)'));
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row in set (0.06 sec)


Now as processed by PostGIS:

tiger=# select count(*) from tiger_roads_texas 
where geom &&
GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342',2163);
ERROR: parse error - invalid geometry
HINT: "...RING(452284 -1651542, 452484 -1651342" <-- parse error at position 43 within geometry
CONTEXT: SQL function "geomfromtext" statement 1

tiger=# select count(*) from tiger_roads_texas
where geom &&
GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342)',2163);
count
-------
1
(1 row)


Can you spot the difference? Snark! Another one for the list.
 

Tuesday, June 09, 2009

Wanted: OK Corral

A WMS performance benchmark has been a staple of FOSS4G conferences for some time. In 2005, it was IMS vs Mapserver. In 2007 it was MapServer vs Geoserver. And in 2008, a grudge MapServer vs Geoserver re-match.

For 2009, we hope to continue the MapServer vs Geoserver tradition, and are inviting other WMS servers to join the fray. We are hoping to have ArcGIS Server in the mix, perhaps MapGuide, perhaps DeeGree. The participants are assembling on a benchmarking listserv.



However, right now we are stuck trying to find a location for our gunfight – we need an OK Corral. Our preferred corral would have the following characteristics:

  • One or more dual-core processors
  • 4Gb of more of RAM
  • Centos or RHEL
  • Remote ssh access for participants
  • Root access or sudo for participants
  • Not virtualized


In addition, we will need a second server on the same network segment for generating load (would still need remote access, but would not need a beefy machine). Due to the nature of the participants (global) and the timelines (several months) we would need sole use of the corral until the testing is complete in September.

If you have a corral you can donate for the shoot-out, let me know!

Update: We have received a generous offer from the US Army Corps.
 

Tuesday, June 02, 2009

ESRI "Free" Web Services

I'm a nice guy, I often raise ESRI's web services (formerly ArcWeb Services, now ArcGIS Online) when talking to clients about options for things like map services, geocodes and routes. It's my way of rooting for the scrappy underdog, the old paleogeographic home team, going up against the Google and Microsoft Bing behemoths.



But someone, please, tap the Redlands team with the clue stick... check out the fabulous new "free" services ESRI is offering to lure developers to their ecosystem!

Free geocoding! Yes! Free! And as many as 1000 geocodes per year. You read that right, kids, per year. Also routing! 5000 per year!

Compare with Yahoo!'s (aside, something about putting an apostrophe after an exclamation mark feels wrong) free API, which offers 5000 geocodes per day (Google offers 15000).

There's a punch-line in here somewhere, but I'm not sure where.

Update: Ray from ESRI notes in the comments that "... the limit of 1,000 geocodes is for geocodes done in BATCH MODE (ie: a request involving more than one address at a time). Place-finding, single address geocoding and single address reverse geocoding are not limited." I may have had it completely backwards, ESRI is not being too stingy, they are being too generous. I'm pretty sure there's lots of people who can script their computers into running lots of sequential individual geocoding requests ... in a "batch", as it were.

Update 2: Ray from ESRI further clarifies the meaning of "batch": "Batch geocoding really means that you are storing the results of your request locally, so you can use them again." So the "batchness" of your request is not governed by the size of the request, but by what you do with the request. (Wait, I've heard that somewhere before...) Comparing to the Yahoo! terms of use we find a similar restriction, which means the ESRI offering is the-same-only-better (fewer restrictions on non-"batch" requests). Better put away the clue-stick, nothing to see here, move along, move along.
 

About Me

My Photo
Paul Ramsey
Victoria, British Columbia, Canada
View my complete profile

Blog Archive