Thursday, February 23, 2012

PostGIS vs Oracle Spatial

"That's a nice database you have there, but how does it compare to Oracle?"

A fair question. On the one hand we have an open source database, with a core development community of a few dozens and a spatial development community of ... a few! On the other hand we have a multi-billion dollar IT behemoth with a client list of Fortune 100 companies. On our biases alone, one would expect Oracle to perform much better.

And we'd have to go on our biases, because there haven't been any Oracle vs PostGIS comparisons available in the wild. Until now.

The Advanced Research Lab for Geospatial Information Science and Engineering of the prestigious Indian Institute of Technology has put a study online, comparing PostGIS to Oracle Spatial across a number of spatial join operations, run in both "cold" (empty cache) and "hot" (pre-seeded cache) modes.

The tests require the database to correctly plan a self-join that includes both spatial and attribute clauses, and then execute. The queries generally seem to require a medium to large quantity of spatial objects to be evaluated in spatial predicate tests. So this is a step up from the usual test of bulk bounding box operations that most benchmarks have gotten bogged down in.

And the results? The authors say:

From the experimental results that we saw, we can conclude that Postgres performs better than Oracle 11g both in the Cold Phase and Warm Phase. Though in few queries Oracle 11g performed better but on the whole Postgres overpowered Oracle 11g. In the warm phase in 3 out of 4 queries Postgres performed significantly well, from this we can conclude that Postgres has better automatic memory management capabilities and page replacement policies... On the whole it is the open-source that wins the game!

Methodologically there are two obvious issues: one is that the Oracle database was on Windows while the PostGIS database was on Linux; the other is that neither database got any tuning, they were both installed and run with default parameters. However, this is one of the nicer comparisons I have read: concise, focussed and with enough technical detail to evaluate what's going on.

Based on that detail, I can also take a stab at guessing why PostGIS did not win every test: the two slower tests used the touches relationship, which is not optimized in PostGIS using a prepared geometry approach. And the non-optimized predicates in PostGIS are quite inefficient, they calculate far more topological information than is strictly necessary to answer a true/fale question about a single topological relationship. So, more room for improvement!

Thanks to IIT for carrying out and sharing this research, truly invaluable stuff.


Warren J. Medernach, BGIS, GISP said...

That is very interesting Paul, and great news for the open source communities.

It doesn't make sense though that they used different OS's for the test?

Aren't both of the DBs supported on Windows and Ubuntu?

Wouldn't the test be more valid if the same OS was used for both DBs?

Just some thoughts...

Paul Ramsey said...

I agree. The authors do address this up front, saying, "Since this project is basically a comparison between Open-source and Proprietary products, That is why
the combination of Oracle 11g on Windows 7 and Postgres/Postgis on Ubuntu 10.10 have been chosen
for the experiment. " However, since database and OS are conflated it's harder to draw concrete conclusions. But it was ever so in benchmarks, there's always a confounding variable, or a tuning parameter missed.

Paolo Corti said...

Another valid reason not to use Windows for Oracle would have been the fact that this combination is extremely exotic in production.
Anyway, excellent results for PostGIS, but even without this benchmark I already had the suspect :)

John Z said...

Of course, I'd love to see these tests re-run on 64-bit Oracle Linux (where we put our Oracle). Then you'd be able to say the Oracle was running on its home turf and be able to a true apples:apples comparison. Come to think of it, I'd even like to see speed tests of PostGIS on different Linux distros...

Do you think test results like this would change much under PostGIS 2.0

Paul Ramsey said...

No, PostGIS 2.0 isn't any faster (or slower) it just has more new features.

Tomáš Vondra said...

Well, the fact that they did no tuning more or less runins the whole benchmark IMHO. It's like speed-testing cars without releasing a hand-brake - it's difficult to predict actual performance.

Paul Ramsey said...

They didn't tune either database (and the default PgSQL tunings are ... extra crapulescent), but actually since (IMO) the tests were mostly CPU bound I don't think it would be a huge change. I also noted this morning that the machines tested on had "only" 2GB of memory, which means Oracle might not have been able to get out of its own way (which should be a valid demerit against it, but so often is allowed to slide).

Michael Smith said...

reading the study, it appears that they did not know very much about setting up the queries properly for oracle spatial. The did not use the /*+ordered*/ hint to use the filters before the spatial operation and they did not add indexes on the column that they were filtering on. I think you would get significantly different results is the queries were properly constructed for Oracle

About Me

My Photo
Victoria, British Columbia, Canada

Blog Archive


bc (44) it (35) postgis (24) video (15) enterprise IT (11) icm (11) gis (9) sprint (9) foi (8) open source (8) osgeo (8) enterprise (7) cio (6) foippa (6) foss4g (6) management (6) politics (6) spatial it (6) outsourcing (5) mapserver (4) bcesis (3) boundless (3) email (3) opengeo (3) oracle (3) rant (3) COTS (2) architecture (2) cartodb (2) deloitte (2) esri (2) hp (2) idm (2) javascript (2) natural resources (2) ogc (2) open data (2) openstudent (2) oss (2) postgresql (2) technology (2) vendor (2) web (2) 1.4.0 (1) HR (1) access to information (1) accounting (1) agile (1) archive (1) aspen (1) bcpoli (1) benchmark (1) buffer (1) build vs buy (1) business (1) business process (1) c (1) career (1) cathedral (1) client (1) cloud (1) code (1) common sense (1) consulting (1) contracting (1) core review (1) crm (1) crockofshit (1) cunit (1) custom (1) data science (1) data warehouse (1) design (1) development (1) digital (1) environment (1) essentials (1) evil (1) exadata (1) fcuk (1) fgdb (1) fme (1) foocamp (1) foss4g2007 (1) ftp (1) gdal (1) gds (1) geocortex (1) geometry (1) geoserver (1) geotiff (1) google (1) google earth (1) government (1) grass (1) hadoop (1) iaas (1) icio (1) imagery (1) industry (1) innovation (1) integrated case management (1) introversion (1) iso (1) isss (1) isvalid (1) jpeg (1) jts (1) lawyers (1) mapping (1) mcfd (1) media (1) microsoft (1) money (1) mysql (1) new it (1) nosql (1) nrs transformation (1) oipc (1) opengis (1) openlayers (1) paas (1) pgconfsv (1) pirates (1) policy (1) portal (1) proprietary software (1) public accounts (1) qgis (1) r (1) rdbms (1) recursion (1) redistribution (1) regression (1) rfc (1) right to information (1) saas (1) salesforce (1) sardonic (1) scandal (1) seibel (1) sermon (1) server (1) siebel (1) snark (1) spatial (1) standards (1) statistics (1) svr (1) taxi (1) tempest (1) texas (1) tired (1) transit (1) tripledelete (1) twitter (1) uber (1) udig (1) uk (1) uk gds (1) verbal culture (1) victoria (1) waterfall (1) wfs (1) where (1) with recursive (1) wkb (1)