Friday, June 12, 2009

MySQL Snark #2

I am doing a little benchmarking as a learning experience with JMeter and I will publish the throughput numbers in a few days, after I run the full suite I have developed on the various combinations of concurrency and insert/select ratios.

Because MySQL has so few functions that actually do anything (see the note here) there's not a great deal to test beyond raw performance. The early throughput results seem to indicate it's comparable for simple CRUD on one table, but for anything non-trivial it falls down.

Here's a basic spatial join: pull 23 roads from a 3.4M row line table and spatially join to a 66K row tract polygons table, calculating the sum of the areas of tract polygons found. There are spatial indexes on both tables.

mysql> select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t
where
mbrintersects(r.geom, GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)'))
and
mbrintersects(r.geom,t.geom);

+-------------------+
| sum(area(t.geom)) |
+-------------------+
| 1260394420.00453 |
+-------------------+
1 row in set (9.43 sec)


And in PostGIS:

tiger=# select sum(area(t.geom)) 
from tiger_roads_texas r, tiger_tracts t
where r.geom && GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)',2163)
and r.geom && t.geom;

sum
------------------
1260394420.00684
(1 row)

Time: 5.574 ms


Those are both "hot cache" results, after running them a couple times each.
 

4 comments:

Regina Obe said...

Did you do an explain on both to verify both are using the spatial indexes?

Did you run ANALYZE TABLE on the MySQL table and vacuum analyze on the postgresql one?

Paul Ramsey said...

No change in time after analyzing everything. Explain on MySQL I can't read, but here it is:

mysql> explain select sum(area(t.geom)) from tiger_roads_texas r, tiger_tracts t where mbrintersects(r.geom, GeomFromText('LINESTRING(453084 -1650742,452384 -1650442)')) and mbrintersects(r.geom,t.geom);
+----+-------------+-------+-------+----------------------------+------+---------+------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------------+------+---------+------+-------+--------------------------------+
| 1 | SIMPLE | r | range | geom,tiger_roads_texas_gix | geom | 34 | NULL | 1 | Using where |
| 1 | SIMPLE | t | ALL | geom,tiger_tracts_gix | NULL | NULL | NULL | 66437 | Using where; Using join buffer |
+----+-------------+-------+-------+----------------------------+------+---------+------+-------+--------------------------------+
2 rows in set (0.02 sec)

It's certainly a bad plan, but that's the point innit? :)

Regina Obe said...

Yap from the plan looks like its using the spatial index on tiger_roads and a table scan on tiger_tracts.

Just wanted to be fair. From experience MySQL does poorly at optimizing any query that is marginally complicated.

Though 6.0 (or is it 5.3 I forget) is supposed to be better. Which version are you running?

Regina Obe said...

I meant to say 5.4 -- but doesn't look like the planner optimization changes are mentioned in 5.4 release notes so assume its still the same sucky planner

http://dev.mysql.com/doc/mysql-5.4-features/en/index.html

I think its in 6.0
http://dev.mysql.com/doc/refman/6.0/en/mysql-nutshell.html
See the part about
"Optimizer enhancements for faster subqueries and joins, including batched index access of table rows for sequences of disjoint ranges by the MyISAM and InnoDB storage engines. "

Ooh and I read somewhere that Foreign keys might actually do something in MyISAM in 6.0. What a brilliant idea. Though can't remember where I read that.

About Me

My Photo
Paul Ramsey
Victoria, British Columbia, Canada
View my complete profile

Followers

Blog Archive

Labels