Optimize, Optimize, Optimize!

I have been trying to squeeze every last erg of performance into a Mapserver installation for a client. Their workload is very raster-based, pulling from JPEG-compressed, internally-tiled TIFF files. We found one installation mistake, which was good for a big performance boost (do not fill your directories with too many files, or the operating system will spend all its time doing file searching). But once that was identified, only the raster algorithms themselves were left to be optimized.

Firing up the Shark, my new favorite tool, I found that the cycles were being spent about 60% in the JPEG decompression and 30% in the Mapserver average re-sampling. Decompressing the files did make things faster, but at a 10x file size penalty – unacceptable. Removing the average re-sampling did make things faster, but with very visible aesthetic penalty – unacceptable.

Using two tools from Intel, I have been able to cut about 40% off of the render time, without changing any code at all!

First, Intel publishes the “Integrated Performance Primitives “, basically a big catch-all library of multi-media and mathematics routines, built to take maximum advantage of the extended instruction sets in modern CPUs. Things like SSE, SSE2, SSE3, SSE4 and so on.

Ordinarily, this would only be of use if I had a couple weeks to examine the code and fit the primitives into the existing code base. Fortunately for me, one of the example programs Intel ships with IPP is “ijg”, a port of the Independent JPEG Group reference library that uses IPP acceleration wherever possible. So I could simply compile the “ijg” library and slip it into place instead of the standard “libjpeg”.

Second, Intel also publishes their own C/C++ compiler, “icc”, which uses a deep understanding of the x86 architecture and the extensions referenced above to produced optimized code. I used “icc” to compile both Mapserver and GDAL, and saw performance improve noticeably.

So, amazingly, between simply re-compiling the software with “icc” and using the “ijg” replacement library to speed up the JPEG decompression, I’ve managed to extract about 40% in performance, without touching the code of Mapserver, GDAL, or LibJPEG at all.

Strong Like Ox!

I’ve put my Mapserver slides online from my recent gig at GeoWeb. We’ll be in town until Friday! Try the veal!

Buffering Jujutsu

Update: Martin Davis tried out a variant of simplification, and produced results even faster than Michaël’s.

Inquiring about some slow cases in the JTS buffer algorithm, Michaël Michaud asked:

I have no idea why a large buffer should take longer than a small one, and I don’t know if there is a better way to compute a buffer.

Martin Davis and I dragged out the usual hacks to make large buffers faster, simplifying the input geometry, or buffering outwards in stages. Said Martin:

The reason for this is that as the buffer size gets bigger, the “raw offset curve” that is initially created prior to extracting the buffer outline gets more and more complex. Concave angles in the input geometry result in perpendicular “closing lines” being created, which usually intersect other offset lines. The longer the perpendicular line, the more other lines it intersects, and hence the more noding and topology analysis has to be performed.

A couple days later, Michaël comes back with this:

I made some further tests which show there is place for improvements with the excellent stuff you recently added.

I created a simple linestring: length = 25000m, number of segments = 1000

Trying to create a buffer at a distance of 10000 m ended with a heapspace exception (took more than 256 Mb!) after more than 4 minutes of intensive computation !

The other way is: decompose into segments (less than 1 second); create individual buffers (openjump says 8 seconds but I think it includes redrawing); cascaded union of resulting buffer (6 seconds).

Holy algorithmic cow, Batman! Creating and merging 1000 wee buffers is orders of magnitude faster than computing one large buffer, if you merge them in the right order (by using the cascaded union). Huge kudos to Michaël for attempting such a counter-intuitive processing path and finding the road to nirvana.

"What if you're looking for a coffee shop nearby..."

If I hear this LBS “use case” one more time, I think my ears are going to start to bleed.

GeoWeb: Day 1/2

I’ve really enjoyed my GeoWeb experience. The official program has been mediocre (I say that and, hey, I’m on it) but the networking has been sublime. Ron Lake has managed to assemble a great cross section of movers and shakers in the next wave of geospatial in one place, so GeoWeb is the place to schmooze. Of course, being in one of the most lovely cities in North America is an added bonus, so the settings for schmoozing have been great too: the Wosk Center and last night a boat cruise during the summer Vancouver fireworks festival held on the harbour.

Schmoozing highlights: Ed Katibah, Microsoft SQL Server Spatial eminence (apparently, he likes chocolate) and repository of the complete history of all spatial databases, ever; blogging celebritarian James Fee, who is far less opinionated in person than one might expect; Alex Miller, head of ESRI Canada, auto racer, polymath, business tycoon; and Mike Springer who, apparently, was a professional fire-eater in Vegas a mere 10 years ago. He has since written Sketchup (that piddling little thing) and now heads Google’s whole 3D program but somehow the whole fire-eating thing keeps distracting me. Fire eating!