Valgrinding PostGIS

UPDATE 2021/09/07: So, notwithstanding the below, running a PostgreSQL backend through valgrind will only very rarely tell you anything useful. The only reason I found it useful, back in 2008, is that memory errors I was tracking down were generated by GEOS, which allocates using the system memory allocator.

PostGIS/PostgreSQL on the other hand, allocates using a “heirarchical memory manager” inside PgSQL, which is why the code is full of allocations using palloc() and not malloc(). The PgSQL memory manager allocates big “memory context” blocks, and then when cleaning up after a query or other big piece of work, drops the whole context at once. This results in all sorts of good things for PgSQL: memory locality, reduction in system calls to things like free(), faster allocations of small objects.

However, since only the allocation of the context and the freeing of the context use system memory calls, valgrind has no visibility into the small allocations your application code might be making. It only sees these big allocations / frees, and will never see the specific allocations / frees in your application code.

Clever developers might be thinking: “I know, I’ll just over-ride palloc() and pfree() in the PgSQL code, so that all allocations go via the system, and then valgrind will have to pick them up”. Except, because of the memory context system, lots of code doesn’t bother to pfree() all allocations, since the system will just free the whole context at the end.

As a result, if you try to push all allocations to malloc() and free() and then run valgrind, valgrind will say that PgSQL leaks like a sieve, and any mistakes you might want to find in your application code will be drowned out in the avalanche of un-freed memory valgrind is now finding.

So, you want to be a super-hero? How about tracking down memory leaks and otherwise using valgrind to make PostGIS a cleaner and better system? However, getting PostGIS into valgrind can be a bit tricky.

First of all, what is valgrind? It’s a tool for finding memory leaks and other memory issues in C/C++ code. It only runs under Linux, so you do need to have sufficiently portable code to run it there. Many memory checking tools rely on “static code analysis”, basically looks at what your code says it does and seeing if you have made any mistakes.

These kinds of tools have to be very clever, since they not only need to understand the language, they have to understand the structure of your code. Valgrind takes the opposite tack – rather than inspect your code for what it says it does, it runs your code inside an emulator, and sees what it actually does. Running inside valgrind, every memory allocation and deallocation can be tracked and associated with a particular code block, making valgrind a very effective memory debugger.

In order to get the most useful reports, you have to compile your code with minimal optimization flags, and with debugging turned on. To grind both GEOS and PostGIS simultaneously, compile GEOS and PostGIS with the correct flags:

# Make GEOS:
CXXFLAGS="-g -O0" ./configure
make clean
make install

# Make PostGIS:
CFLAGS="-g -O0" ./configure --with-pgconfig=/usr/local/pgsql/bin/pg_config
make clean
make install

Once you have your debug-enabled code in place, you are ready to run valgrind. Here things get interesting! Usually, PostgreSQL is run in multi-user mode, with each back-end process spawned automatically as connections come in. But, in order to use valgrind, we have to run our process inside the valgrind emulator. How to do this?

Fortunately, PostgreSQL supports a “single user mode”. Shut down your database instance (pg_ctl -D /path/to/database stop) first. Then invoke a postgres backend in single-user mode:

echo "select * from a, b where st_intersects(a.the_geom, b.the_geom)" | \
  valgrind \
    --leak-check=yes \
    --log-file=valgrindlog \
    /usr/local/pgsql/bin/postgres --single -D /usr/local/pgdata postgis

So, here we are echoing the SQL statement we want tested, and piping it into a valgrind-wrapped instance of single-user PostgreSQL. Everything will run much slower than usual, and valgrind will output a report to the valgrindlog file detailing where memory blocks are orphaned by the code.