From 0 to 65 Million in 2 Hours

I’m doing some performance benchmarking for a client this week, so getting a big, real test database is a priority. The USGS TIGER data is one of the largest uniform data sets around, so I’ve started with that.

I just loaded all the edges, 64,830,691 of them, and it took just under 2 hours! Fortunately, the 2007 data comes in shape files, and the schemas are identical for each file, so the load script is as simple as this controller:

find . -name "fe*_edges.zip" -exec ./append_edges.sh {} ';'

And this runner (append_edges.sh):

unzip $1
shp2pgsql -W WINDOWS-1252 -D -a -s 4269 \`basename $1 .zip\`.shp fe_edges | psql tiger
rm fe*edges.*

Note the use of the -W parameter, to ensure that the high-bit “charàctérs” are handled correctly, and the -a parameter, to append the file contents to the table.