PgSQL on EC2

The theory behind putting a PostgreSQL (and PostGIS) instance on an Amazon EC2 instance with an Elastic Block Store (EBS) file system underneath is pretty straightforward, even for big databases. But when you want those databases to show the kind of properties we have come to expect from our systems, like durability, throughput, and reliability, things get much harder.

This thread on pgsql-general was very illuminating to me. Among the tidbits:

Let’s be clear here, physical I/O is at times terrible. :)

There’s no way we could run this database on a single EBS volume.

We had to fail over to one of our spares twice in the last 1.5 years. Not fun. Both times were due to instance failure.

Basically the assumptions of AWS architecture (virtual instances will be less reliable than real world computers, but that doesn’t matter because getting a new one is really easy) don’t map well with the requirements of running a classic production database.

There are probably some engineering solutions around for this (GlusterFS, for example, but the core PgSQL would need some serious work and end up looking a lot more like OracleRAC than the currently single-machine set-up.