Waiting for Postgis 3.1: Vector tile improvements

This is a guest post from Raúl Marín, a core PostGIS contributor and a former colleague of mine at Carto. Raúl is an amazing systems engineer and has been cruising through the PostGIS code base making things faster and more efficient. You can find the original of this post at his new personal tech blog. – Paul


I’m not big on creating new things, I would rather work on improving something that’s already in use and has proven its usefulness. So whenever I’m thinking about what I should do next I tend to look for projects or features that are widely used, where the balance between development and runtime costs favors a more in depth approach.

Upon reviewing the changes of the upcoming PostGIS 3.1 release, it shouldn’t come as a surprise then that most of my contributions are focused on performance. When in doubt, just make it faster.

Since CARTO, the company that pays for my lunch, uses PostGIS’ Vector Tile functions as its backend for dynamic vector maps, any improvement there will have a clear impact on the platform. This is why since the appearance of the MVT functions in PostGIS 2.4 they’ve been enhanced in each major release, and 3.1 wasn’t going to be any different.

In this occasion the main reason behind the changes wasn’t the usual me looking for trouble, but the other way around. As ST_AsMVT makes it really easy to extract information from the database and into the browser, a common pitfall is to use SELECT * to extract all available columns which might move a lot of data unnecessarily and generate extremely big tiles. The easy solution to this problem is to only select the properties needed for the visualization but it’s hard to apply it retroactively once the application is in production and already depending on the inefficient design.

So there I was, looking into why the OOM killer was stopping databases, and discovering queries using a massive amount of resources to generate tiles 50-100 times bigger than they should (the recommendation is smaller than 500 KB). And in this case, the bad design of extracting all columns from the dataset was worsened by the fact that is was being applied to a large dataset; this triggered PostgreSQL parallelism requiring extra resources to generate chunks in parallel and later merge them together. In PostGIS 3.1 I introduced several changes to improve the performance of these 2 steps: the parallel processing and the merge of intermediate results.

The changes

Without getting into too much detail, the main benefit comes from changing the vector tile .proto such that a feature can only hold one value at a time. This is what the specification says, but not what the .proto enforces, therefore the internal library was allocating memory that it never used.

There are other additional changes, such as improving how values are merged between parallel workers, so feel free to have a look at the final commit itself if you want more details.

Performance comparison

The best way to see the impact of these changes is through some examples. In both cases I am generating the same tile, in the same exact server and with the same dependencies; the only change was to replace the PostGIS library, which in 3.0 to 3.1 doesn’t require an upgrade.

In the first example the tile contains all the columns of the 287k points in it. As I’ve mentioned before, it is discouraged to do this, but it is the simplest query to generate.

And for the second example, I’m generating the same tile but now only including the minimal columns for the visualization:

We can see, both in 3.0 and 3.1, that adding only the necessary properties makes things 10 times as fast as with the full data, and also that Postgis 3.1 is 30-40% faster in both situations.

Memory usage

Aside from speed, this change also greatly reduces the amount of memory used to generate a tile.

To see it in action, we monitor the PostgreSQL process while it’s generating the tile with all the properties. In 3.0, we observe in the blue line that the memory usage increases with time until it reaches around 2.7 GB at the end of the transaction.

We now monitor the same request on a server using Postgis 3.1. In this case the server uses around a third of the memory as in 3.0 (1GB vs 2.7GB) and, instead of having a linear increase, the memory is returned back to the system as soon as possible.

To sum it all up: PostGIS 3.1 is faster and uses less memory when generating large vector tiles.

BC IT Outsourcing 2019/20

Way back in the 1980s, the Vander Zalm government privatized BC highways maintenance. They let out big contracts to private companies (fortunately this was before the era of international infrastructure firms, so they were local companies) and sold off the government road building machinery. The government employees union was naturally apoplectic and very much wanted the decision reversed. An NDP government took power at the next election, and… nothing changed.

The omlette was unscramblable. The expense and disruption of re-constituting the old highways maintenance department outweighed the cost. The machinery was all sold. The government had other fish to fry. It didn’t happen then. It hasn’t happened yet. Highways maintenance is still outsourced.

With that out of the way, here’s the latest data on BC information technology outsourcing.

The political economy of changing the trend was never good. On the side of change, was a line in the Minister of Citizens Services 2017 mandate letter:

Institute a cap on the value and the length of government IT contracts to save money, increase innovation, improve competition and help our technology sector grow.

On the side of more of the same, are:

  • Existing relationships between civil servants and service providers.
  • Contractual obligations that still have multiple years to run.
  • Fear of change, including risk of service disruptions amidst big organizational changes.
  • Comfort of stasis, just renew contracts and keep slowly growing budgets.
  • Deep lack of political sexiness of internal IT issues.
  • No one at the political level who is invested in the issue. (See previous.)

The UK IT reform experience, which is still ongoing in fits and starts, got off to a fast start because of the enthusiastic backing of the Cabinet Office Minister, Sir Francis Maude. It quickly hit the rocks after Maude retired and they lost his political cover.

Getting out from under these contracts is the right thing to do, but it’s the right thing to do in very abstract and theoretical ways:

  • We want government to be more nimble and able to react to changes in society and policy.
  • The machinery of government runs on information.
  • The closer the people who work on the information are to the people working the policy, the easier it is for them to understand and react to their needs.
  • Outsourced IT places a contractual buffer between the people who need the information services and the people who provide those services. It optimizes for the predictable and charges heavily for the novel.

There’s no easy big guaranteed win, and no press conference, and no glory. Just more reactive information services with incentives that align more closely to those of the people trying to deliver value to the people of BC. It’s technocratic, dull, worth doing, and probably not going to happen.

Anyways, back to the horse race.

The big surprise for me continues to be Maximus. Still billing strong, even with MSP premiums phased out? The MSP premium elimination date was January 1, 2020, so maybe this year will finally be the one we see a big drop in Maximus billings.

The continued growth of local companies is a positive trend. These aren’t Mom’n’Pop shops, I don’t track those, but companies with revenues north of several millions. Altogether they are still doing less business than IBM alone, but it’s a solid 10% of the total now.

It’s possible I’m being overly pessimistic. The biggest components of the IT outsourcing budget are the Telus and ESIT contracts, both of which end in 2021. But the easy thing is to just renew and move on, and in a world of Covid and recessions and many other issues much nearer to the daily lives of citizens I would not be surprised to see this issue languish.

Talking PostGIS on Podcasts

Here in the Covid-times, I haven’t been able to keep up my previous schedule of speaking at conferences, but I have managed to participate in a couple of episodes of the MapScaping Podcast, hosted by Daniel O’Donohue.

Podcast

Daniel is a great interviewer and really puts together a tight show. So far I’ve been on two, and I quietly hope to join him again some time in the future.

Developers Diary 2

Have you ever watched a team of five-year-olds play soccer? The way the mass of children chases the ball around in a group? I think programmers do that too.

Get the ball!

There’s something about working on a problem together that is so much more rewarding than working separately, we cannot help but get drawn into other peoples problems. There’s a lot of gratification to be had in finding a solution to a shared difficulty!

Even better, different people bring different perspectives to a problem, and illuminate different areas of improvement.

Maximum Inscribed Circle

A couple months ago, my colleague Martin Davis committed a pair of new routines into JTS, to calculate the largest circles that can fit inside a polygon or in a collection of geometries.

Maximum Inscribed Circle

We want to bring all the algorithmic goodness of JTS to PostGIS, so I took up the first step, and ported “maximum inscribed circle” to GEOS and to PostGIS.

When I ported the GEOS test cases, I turned up some odd performance problems. The calculation seemed to be taking inordinately long for larger inputs. What was going on?

The “maximum inscribed circle” algorithm leans heavily on a routine called IndexedFacetDistance to calculate distances between polygon boundaries and candidate circle-centers while converging on the “maximum inscribed circle”. If that routine is slow, the whole algorithm will be slow.

Dan Baston, who originally ported the “IndexedFacetDistance” class got interested and started looking at some test cases of his own.

He found he could improve his old implementation using better memory management that he’d learned in the meantime. He also found some short-circuits to envelope distance calculation that improved performance quite a bit.

In fact, they improved performance so much that Martin ported them back to JTS, where he found that for some cases he could log a 10x performance in distance calculations.

There’s something alchemical about the whole thing.

  • There was a bunch of long-standing code nobody was looking at.
  • I ported an unrelated algorithm which exercised that code.
  • I wrote a test case and reported some profiling information.
  • Other folks with more knowledge were intrigued.
  • They fed their knowledge back and forth and developed more tests.
  • Improvements were found that made everything faster.

I did nothing except shine a light in a dark hole, and everyone else got very excited and things happened.

Toast Caching Redux

In a similar vein, as I described in my last diary entry, a long-standing performance issue in PostGIS was the repeated reading of large geometries during spatial joins.

Much of the problem was solved by dropping a very small “TOAST cache” into the process by which PostGIS reads geometries in functions frequently used in spatial joins.

TOAST

I was so happy with the improvement the TOAST cache provided that I just stopped. Fortunately, my fellow PostGIS community member Raúl Marín was more stubborn.

Having seen my commit of the TOAST cache, and having done some work in other caching parts of PostGIS, he took up the challenge and integrated the TOAST cache with the existing index caches.

The integrated system now uses TOAST identifiers to note identical repeated inputs and avoid both unneccessary reads off disk and unncessary cache checks of the index cache.

The result is that, for spatial joins over large objects, PostGIS 3.1 will be as much as 60x faster than the performance in PostGIS 3.0.

I prepared a demo for a bid proposal this week and found that an example query that took 800ms on my laptop took a full minute on the beefy 16-core demo server. What had I done wrong? Ah! My laptop is running the latest PostGIS code (which will become 3.1) while the cloud server was running PostGIS 2.4. Mystery solved!

Port, Port, Port

I may have mentioned that I’m not a very good programmer.

My current task is definitely exercising my imposter syndrome: porting Martin’s new overlay code from JTS to GEOS.

I knew it would take a long time, and I knew it would be a challenge; but knowing and experiencing are quite different things.

The challenges, as I’ve experienced them are:

  • Moving from Java’s garbage collected memory model to C++’s managed memory model means that I have to understand the object life-cycle which is implicit in Java and make it explicit in C++, all while avoiding accidentally introducing a lot of memory churn and data copying into the GEOS port. Porting isn’t a simple matter of transcribing and papering over syntactic idiom, it involves first understanding the actual JTS algorithms.
  • The age of the GEOS code base, and number of contributors over time, mean that there are a huge number of different patterns to potentially follow in trying to make a “consistent” port to GEOS. Porting isn’t a matter of blank-slate implementation of the JTS code – the ported GEOS code has to slot into the existing GEOS layout. So I have to spend a lot of time learning how previous implementations chose to handle life cycles and call patterns (pass reference, or pointer? yes. Return value? or void return and output parameter? also yes.)
  • My lack of C++ idiom means I spend an excessive amount of time looking up core functions and methods associated with them. This is the only place I’ve felt myself measurably get better over the past weeks.

I’m still only just getting started, having ported some core data structures, and little pieces of dependencies that the overlay needs. The reward will be a hugely improved overlay code for GEOS and thus PostGIS, but I anticipate the debugging stage of the port will take quite a while, even when the code is largely complete.

Wish me luck, I’m going to need it!

If you would like to test the new JTS overlay code, it resides on this branch.
If you would like to watch me suffer as I work on the port, the GEOS branch is here.

Developers Diary 1

I’m not a particularly good developer.

I don’t plan well, I tend to hack first and try and find the structure afterwards. I am easily distracted. It takes me an exceedingly long time to marshal a problem in my head enough to attack it.

That said, the enforced slow-down from pandemic time has given me the opportunity to sit and look at code, knowing nothing else is coming down the pipe. There are no talks to prepare, no big-think keynotes to draft. I enjoy those things, and I really enjoy the ego-boost of giving them, but the preparation of them puts me in a mental state that is not conducive to doing code work.

So the end of travel has been good, for at least one aspect of my professional work.

The Successful Failure

Spatial operations against large objects have always been a performance hot spot.

The first problem is that large objects are … large. So if you have algorithms that scale O(n^2) on the number of vertices large objects will kill you. Guess what? Distance, intersects tests, and so on are all O(n^2) in their basic implementations.

We solved this problem a long time ago in PostGIS by putting in an extra layer of run-time indexing.

INDEXING!

During a query (for those functions where it makes sense) if we see the same object twice in a row, we build an index on the edges of that object and keep the index in memory, for the life of the query. This gives us O(log(n)) performance for intersects, point-in-polygon, and so on. For joins in particular, this pattern of “seeing the same big thing multiple times” is very common.

This one small trick is one reason PostGIS is so much faster than “the leading brands”.

However, in order to “see the same object twice” we have to, for each function call in the query, retrieve the whole object, in order to compare it against the one we are holding in memory, to see if it is the same.

Here we run into an issue with our back-end.

PostgreSQL deals with large objects by (a) compressing them and (b) cutting the compressed object into slices and storing them in a side table. This all happens in the background, and is why you can store 1GB objects transparently in a database that has only an 8KB page size.

It’s quite computationally expensive, though. So much so that I found that simply bypassing the compression part of this feature could provide 5x performance gains on our spatial join workload.

Faster

At a code sprint in 2018, the PostGIS team agreed on the necessary steps to work around this long-standing performance issue.

  • Enhance PostgreSQL to allow partial decompression. This would allow the PostGIS caching system to retrieve just a little bit of large objects and use that part to determine if the object was not already in the cache.
  • Enhance the PostGIS serialization scheme to add a hashcode at the front of each large object. This way “is this a new object” could be answered with just a few bytes of hash, instead of checking the whole object.
  • Actually update the caching code code to use hash code and avoid unneccessary object retrievals.

Since this involved a change in PostgreSQL, which runs on an annual release cycle, and a change to the PostGIS serialization scheme, which is a major release marker, the schedule for this work was… long term.

Long Term

Still, I managed to slowly chip away at it, goal in mind:

That left adding the hash code to the front of the objects, and using that code in the PostGIS statement cache.

And this is where things fall apart.

Things Fall Apart

The old statement cache was focussed on ensuring the in-memory indexes were in place. It didn’t kick in until the object had already been retrieved. So avoiding retrieval overhead was going to involve re-working the cache quite a bit, to handle both object and index caching.

I started on the work, which still lives on in this branch, but the many possible states of the cache (do I have part of an object? a whole object? an indexed object?) and the fact that it was used in multiple places by different indexing methods (geography tree, geometry tree, GEOS tree), made the change worrisomely complex.

And so I asked a question, that I should have asked years ago, to the pgsql-hackers list:

… within the context of a single SQL statement, will the Datum values for a particular object remain constant?

Basically, could I use the datum values as unique object keys without retrieving the whole object? That would neatly remove any need to retrieve full objects in order to determine if the cache needed to be updated. As usual, Tom Lane had the answer:

Jeez, no, not like that.

Oh, “good news”, I guess, my work is not in vain. Except wait, Tom included a codicil:

The case where this would actually be worth doing, probably, is where you are receiving a toasted-out-of-line datum. In that case you could legitimately use the toast pointer ID values (va_valueid + va_toastrelid) as a lookup key for a cache, as long as it had a lifespan of a statement or less.

Hm. So for a subset of objects, it was possible to generate a unique key without retrieving the whole object.

Facepalm

And that subset – “toasted-out-of-line datum” – were in fact the objects causing the hot spot: objects large enough to have been compressed and then stored in a side table in 8KB chunks.

What if, instead of re-writing my whole existing in-memory index cache, I left that in place, and just added a simple new cache that only worried about object retrieval. And only cached objects that it could obtain unique keys for, these “toasted-out-of-line” objects. Would that improve performance?

It did. By 20 times on my favourite spatial join benchmark. In increased it by 5 times on a join where only 10% of the objects were large ones. And on joins where none of the objects were large, the new code did not reduce performance at all.

And here’s the punch line: I’ve known about the large object hot spot for at least 5 years. Probably longer. I put off working on it because I thought the solution involved core changes to PostgreSQL and PostGIS, so first I had to put those changes in, which took a long time.

Once I started working on the “real problem”, I spent a solid week:

  • First on a branch to add hash codes, using the new serialization mechanisms from PostGIS 3.
  • Then on a unified caching system to replace the old in-memory index cache.

And then I threw all that work away, and in about 3 hours, wrote and tested the final patch that gave a 20x performance boost.

So, was this a success or a failure?

Stupid

I’ve become inured to the huge mismatch in “time spent versus code produced”, particularly when debugging. Spending 8 hours stepping through a debugger to generate a one-line patch is pretty routine.

But something about the mismatch between my grandious and complex solution (partial retrieval! hash code!) and the final solution (just ask! try the half-measure! see if it’s better!) has really gotten on my nerves.

I like the win, but the path was a long and windy one, and PostGIS users have had slower queries than necessary for years because I failed to pose my simple question to the people who had an answer.

The Successful Success

Contra to that story of the past couple weeks, this week has been a raging success. I keep pinching myself and waiting for something to go wrong.

A number of years ago, JTS got an improvement to robustness in some operations by doing determinant calculations in higher precision than the default IEEE double precision.

Those changes didn’t make it into GEOS. There was an experimental branch, that Mateusz Loskot put together, and it sat un-merged for years, until I picked it up last fall, rebased it and merged it. I did so thinking that was the fastest way, and probably it was, but it included a dependency on a full-precision math library, ttmath, which I added to our tree.

Stupid

Unfortunately, ttmath is basically unmaintained now.

And ttmath is arbitrary precision, while we really only need “higher precision”. JTS just uses a “double double” implementation, that uses the register space of two doubles for higher precision calculations.

And ttmath doesn’t support big-endian platforms (like Sparc, Power, and other chips), which was the real problem. We couldn’t go on into the future without support for these niche-but-not-uncommon platforms.

And ttmath includes some fancy assembly language that makes the build system more complex.

Fortunately, the JTS DD is really not that large, and it has no endian assumptions in it, so I ported it and tested it out against ttmath.

It’s smaller.

It’s faster. (About 5-10%. Yes, even though it uses no special assembly tricks, probably because it doesn’t have to deal with arbitrary precision.)

And here’s the huge surprise: it caused zero regression failures! It has exactly the same behaviour as the old implementation!

Perfect

So needless to say, once the branch was stable, I merged it in and stood there in wonderment. It seems implausable that something as foundational as the math routines could be swapped out without breaking something.

The whole thing took just a few days, and it was so painless that I’ve also made a patch to the 3.8 stable series to bring the new code back for big endian platform support in the mean time.

The next few days I’ll be doing ports of JTS features and fixes that are net-new to GEOS, contemplative work that isn’t too demanding.

Some days everything is easy.

Some days everything is hard.

Don’t let the hard days hold you back!

Perfect