MySQL vs PostGIS

Did I say I would publish my performance results? I did. Here they are.

MySQL Snark #2

I am doing a little benchmarking as a learning experience with JMeter and I will publish the throughput numbers in a few days, after I run the full suite I have developed on the various combinations of concurrency and insert/select ratios.

Because MySQL has so few functions that actually do anything (see the note here) there’s not a great deal to test beyond raw performance. The early throughput results seem to indicate it’s comparable for simple CRUD on one table, but for anything non-trivial it falls down.

Here’s a basic spatial join: pull 23 roads from a 3.4M row line table and spatially join to a 66K row tract polygons table, calculating the sum of the areas of tract polygons found. There are spatial indexes on both tables.

mysql> select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t 
where 
  mbrintersects(r.geom, GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)')) 
and 
  mbrintersects(r.geom,t.geom);

+-------------------+
| sum(area(t.geom)) |
+-------------------+
|  1260394420.00453 | 
+-------------------+
1 row in set (9.43 sec)

And in PostGIS:

tiger=# select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t 
where r.geom && GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)',2163) 
and r.geom && t.geom;

       sum        
------------------
 1260394420.00684
(1 row)

Time: 5.574 ms</pre>

Those are both “hot cache” results, after running them a couple times each.

MySQL Snark

OK, this one I have to share. Here’s two queries, the first with a syntax error in the WKT (oops!) and the second one correct.

First, as processed by MySQL:

mysql> select count(*) from tiger_roads_texas 
  where mbrintersects(geom, 
    GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342'));
+----------+
| count(*) |
+----------+
|        0 | 
+----------+
1 row in set (0.00 sec)

mysql> select count(*) from tiger_roads_texas 
  where mbrintersects(geom, 
    GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342)'));
+----------+
| count(*) |
+----------+
|        1 | 
+----------+
1 row in set (0.06 sec)

Now as processed by PostGIS:

tiger=# select count(*) from tiger_roads_texas 
where geom && 
  GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342',2163);
ERROR:  parse error - invalid geometry
HINT:  "...RING(452284 -1651542, 452484 -1651342" <-- parse error at position 43 within geometry
CONTEXT:  SQL function "geomfromtext" statement 1

tiger=# select count(*) from tiger_roads_texas 
  where geom && 
    GeomFromText('LINESTRING(452284 -1651542, 452484 -1651342)',2163);
 count 
-------
     1
(1 row)

Can you spot the difference? Snark! Another one for the list.

Wanted: OK Corral

A WMS performance benchmark has been a staple of FOSS4G conferences for some time. In 2005, it was IMS vs Mapserver. In 2007 it was MapServer vs Geoserver. And in 2008, a grudge MapServer vs Geoserver re-match.

For 2009, we hope to continue the MapServer vs Geoserver tradition, and are inviting other WMS servers to join the fray. We are hoping to have ArcGIS Server in the mix, perhaps MapGuide, perhaps DeeGree. The participants are assembling on a benchmarking listserv.

However, right now we are stuck trying to find a location for our gunfight – we need an OK Corral. Our preferred corral would have the following characteristics:

  • One or more dual-core processors
  • 4Gb of more of RAM
  • Centos or RHEL
  • Remote ssh access for participants
  • Root access or sudo for participants
  • Not virtualized

In addition, we will need a second server on the same network segment for generating load (would still need remote access, but would not need a beefy machine). Due to the nature of the participants (global) and the timelines (several months) we would need sole use of the corral until the testing is complete in September.

If you have a corral you can donate for the shoot-out, let me know!

Update: We have received a generous offer from the US Army Corps.

ESRI "Free" Web Services

I’m a nice guy, I often raise ESRI’s web services (formerly ArcWeb Services, now ArcGIS Online) when talking to clients about options for things like map services, geocodes and routes. It’s my way of rooting for the scrappy underdog, the old paleogeographic home team, going up against the Google and Microsoft Bing behemoths.

But someone, please, tap the Redlands team with the clue stick… check out the fabulous new “free” services ESRI is offering to lure developers to their ecosystem!

Free geocoding! Yes! Free! And as many as 1000 geocodes per year. You read that right, kids, per year. Also routing! 5000 per year!

Compare with Yahoo!’s (aside, something about putting an apostrophe after an exclamation mark feels wrong) free API, which offers 5000 geocodes per day (Google offers 15000).

There’s a punch-line in here somewhere, but I’m not sure where.

Update: Ray from ESRI notes in the comments that “… the limit of 1,000 geocodes is for geocodes done in BATCH MODE (ie: a request involving more than one address at a time). Place-finding, single address geocoding and single address reverse geocoding are not limited.” I may have had it completely backwards, ESRI is not being too stingy, they are being too generous. I’m pretty sure there’s lots of people who can script their computers into running lots of sequential individual geocoding requests … in a “batch”, as it were.

Update 2: Ray from ESRI further clarifies the meaning of “batch”: “Batch geocoding really means that you are storing the results of your request locally, so you can use them again.” So the “batchness” of your request is not governed by the size of the request, but by what you do with the request. (Wait, I’ve heard that somewhere before…) Comparing to the Yahoo! terms of use we find a similar restriction, which means the ESRI offering is the-same-only-better (fewer restrictions on non-“batch” requests). Better put away the clue-stick, nothing to see here, move along, move along.