Thursday, September 30, 2010

PostGIS Back-up / Restore

A very common question on PostGIS in production is "how do I upgrade", which is actually a variant on "how do I backup and restore"?

First, for patch version increases (e.g. X.Y.Z -> X.Y.(Z+1)) in PostgreSQL and PostGIS you do not need to do anything at all other than install the new software. The data can remain in place and everything will Just Work.

For minor version increases in PostgreSQL (e.g. X.Y.Z -> X.(Y+1).Z) you need to dump and restore. For minor version increases in PostGIS, you need to do a "soft upgrade", which means leaving the data in place, but running the upgrade scripts (e.g. postgis_upgrade_14_to_15.sql) after you install the software update.

Finally, for major version increases in PostgreSQL and PostGIS (e.g. X.Y.Z -> (X+1).Y.Z) you need to dump and restore.

Which brings me to the actual point I want to make: you can ensure the greatest ease in doing dump and restore of PostGIS data if you ensure that you store no data in the "public" schema.

The "public" schema is where the PostGIS functions and system tables get installed, so if you dump that schema you get all those definitions in your dump. If those definitions are mixed in amongst your data, loading them into a fresh database gets tricky: are the paths to the libraries the same? are there function name clashes? (The utils/postgis_restore.pl script attempts manfully to strip out PostGIS components from a dump file to allow a clean restore, but it is hard to get 100% performance.)

If, on the other hand, all your data is neatly separated into its own schema, you can neatly backup just that schema and avoid having PostGIS system information mixed in with your data. That means you can easily restore your data into any version of PostGIS and PostgreSQL that you like. So upgrades are easy easy easy.

Remember: Store your data in a schema other than "public".

"But Paul," you say, "I already have a full database dump, does that mean I'm SOL?" No, but you will need a strong stomach. First, set up your new PostgreSQL. Create a blank database, load PostGIS into it. Now, load your backup file into that database. You will see lots of errors. However, these errors will be caused by old PostGIS function and type definitions from your dump file conflicting with the existing type definitions in your database. And since you want the new definitions, not the old ones, that's OK. Your load, despite all the noise and errors, should actually work. Once it's done, you can move your data into a nice separate schema, so that next time you can do a clean, error-free restore.
 

6 comments:

Matthijs Laan said...

After I create a database, I like to issue the following commands:

create schema data;
alter database "mydb" set search_path=data,public;

This means any created tables will automatically be placed in the data schema, so any applications or toolkits don't explicitly need to be told to use the schema.

This is also useful for transferring between Linux and Windows, even on the same versions the references to .dll or .so are different (not sure if that is still the case).

Christian said...

This post actually had a great effect on my afternoon: convinced by your argument, I spent some time moving my data away from the "public" schema. And you're right: pg_restore is way quieter that way. Thanks for the tip!

By the way I saw your presentation at the last OSGeo conf in Chicoutimi: it was very inspiring!

Joe said...

should this also go the other way and have postgis installed in a different schema?

I understand there may be migration difficulties, but it would be good if this was the case.

Mike said...

Using a separate schema for your data is definitely good...I hadn't really thought about it in terms of easing PostGIS upgrades.

However, the geometry_columns table is still in the public schema, is it not? Unless there's a situation where it wouldn't be, then if anyone has an application that relies on the contents of geometry_columns, you'll want to be sure you dump/restore the contents of that table as well. Also, don't forget about any custom SRIDs you may have added to the spatial_ref_sys table (not very common, I know).

Paul Ramsey said...

@Christian, glad this was helpful!

@Joe, perhaps. The primary reason has been backward compatibility, and the tendency of folks to hot have their search_paths set right. I can imagine a rash of "I installed PostGIS, but it's not there" questions.

@Mike, I should add an addendum: after restoring your schema, run the SELECT Populate_Geometry_Columns() function to fill out geometry_columns.

strk said...

Worth sourcing the _new_ postgis.sql once again after the restore, to override further any function which may have been overridden by the restore

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels

bc (32) it (26) postgis (17) icm (11) enterprise IT (9) sprint (9) open source (8) osgeo (8) video (8) management (6) cio (5) enterprise (5) foippa (5) gis (5) spatial it (5) foi (4) mapserver (4) outsourcing (4) bcesis (3) foss4g (3) oracle (3) politics (3) COTS (2) architecture (2) boundless (2) esri (2) idm (2) natural resources (2) ogc (2) open data (2) opengeo (2) openstudent (2) postgresql (2) rant (2) technology (2) vendor (2) web (2) 1.4.0 (1) HR (1) access to information (1) accounting (1) agile (1) aspen (1) benchmark (1) buffer (1) build vs buy (1) business (1) business process (1) cathedral (1) cloud (1) code (1) common sense (1) consulting (1) contracting (1) core review (1) crm (1) custom (1) data warehouse (1) deloitte (1) design (1) digital (1) email (1) essentials (1) evil (1) exadata (1) fcuk (1) fgdb (1) fme (1) foocamp (1) foss4g2007 (1) ftp (1) gds (1) geocortex (1) geometry (1) geoserver (1) google (1) google earth (1) government (1) grass (1) hp (1) iaas (1) icio (1) industry (1) innovation (1) integrated case management (1) introversion (1) iso (1) isss (1) isvalid (1) javascript (1) jts (1) lawyers (1) mapping (1) mcfd (1) microsoft (1) mysql (1) new it (1) nosql (1) opengis (1) openlayers (1) oss (1) paas (1) pirates (1) policy (1) portal (1) proprietary software (1) qgis (1) rdbms (1) recursion (1) regression (1) rfc (1) right to information (1) saas (1) salesforce (1) sardonic (1) seibel (1) sermon (1) siebel (1) snark (1) spatial (1) standards (1) svr (1) tempest (1) texas (1) tired (1) transit (1) twitter (1) udig (1) uk (1) uk gds (1) verbal culture (1) victoria (1) waterfall (1) wfs (1) where (1) with recursive (1) wkb (1)