Sprint Day #3

It was “it all comes together” day today at the Toronto Code Sprint.

On the PostGIS front, the open bug list for our 1.4 pre-release shrunk down to the vanishing point. I closed off one [1], Mark Cave-Ayland closed off two [2, 3] and Olivier Courtin closed off one [4] and started on GML support for CURVE types. We should be able to kick out a testing version of 1.4 tomorrow no problem.

On the Mapserver front, our goal of getting faster and faster and making Andrea Aime eat his liver is well in sight. Frank Warmerdam skipped our foray to the Best Beer Palace in Toronto last night, went home instead, and added a memory cache to Proj4, which I tested this morning. The cache has removed the EPSG code look-up overhead from Mapserver, which was far-and-away the largest performance overhead in all modes.

Also today, Julien-Samuel Lacroix from MapGears completed his changes to Mapserver shape file access, which knocked a further 20% off the response time for the case of rendering from large shape files. We also identified a performance hit in outlined label drawing in GD, that he had a patch for several years ago, but unfortunately it requires patching GD also, so it might take a while to work down into general Mapserver usage.

Through profiling of image access in GDAL/Mapserver, Frank has been galvanized to re-work image access in Mapserver to no longer pull data from images one band at a time. For formats like MrSID and ECW, this alone should improve performance by a factor of three.

Meanwhile, Steve Lime has finalized his plans to make querying performance in Mapserver work faster, and written an RFC on his approach. I hope to alter the PostGIS driver to support the “new way of being” tomorrow.

Finally, I profiled hex-vs-base64 encoding options in the Mapserver PostGIS driver and decided on using hex, based on much faster performance. That change was committed today as well. All in all a good day for the “need for speed” Code Sprint.

After a long day of coding (I had to break up an intense discussion of handing Open Web Services configuration in Mapserver to get people to head to dinner) we headed out for our final group meal at Jack Astor’s on Front Street, where a good time was had by all. Tomorrow will be less intense as folks head off for flights at varying parts of the mid-day. I head home late in the evening, chasing the sun back to Victoria on the sole daily non-stop YYZ->YYJ flight.

Thanks to OSGIS.NL and LizardTech for supporting today’s activities.

Sprint Day #2

Another productive day at the Toronto Code Sprint! Today was “sprint forward” day, so everyone got an hour less sleep than they expected, but by 10am everyone was back in the room, coding hard.

I’m proud to have closed two bugs [1] [2] today (both of the “one hour of study per one line of altered code” sort), moving towards our PostGIS goal of releasing what Mark Cave-Ayland has been eloquently calling a 1.4 “beater” for the PostGIS community to test prior to cutting an official numbered 1.4 release candidate. I think we’ll have all the critical bugs closed before the end of the week and have that beater out on the road, mowing down pedestrians, before the week is out.

Towards the end of the afternoon, I fired up the Shark and provided some profiling information for the Mapserver team, and talked about some profiles that Julien from MapGears had generated earlier. Julien is going to enhance some work I did last year on large shape files, to make traversing the shape file selection set even faster (the larger the shape file, the higher the benefit, but by traversing the selection set with 4-byte (int) pointer instead of a 1-byte (char) pointer he should be able to extract some good gains).

Steve Lime continues to contemplate the problem of improving the MapServer query work flow, to make WFS and large query performance on database back-ends faster. It sounds like we are rapidly converging on an answer, so hopefully tomorrow is the day we do an implementation and tie some back-ends into the solution.

At 6:30, we all trouped to the Baton Rouge for dinner, which was excellent. Thanks to Coordinate Solutions and SJ Geophysics for supporting today’s activities!

Sprint Day #1

Ready, set, go!

The first day of the Toronto Code Sprint 2009 was today, as 20 hackers from the geospatial “C Tribe” gathered in a smallish room at the Radisson hotel on the Toronto waterfront.

The weather was warm and clear yesterday, but cold and wet today, not that we noticed until the evening.

I spent my day huddled with the Largest Group of PostGIS Developers Ever Assembled. Myself, Regina Obe, Mark Cave-Ayland, Leo Hsu, Olivier Courtin and Pierre Racine. Day one was mostly talking, about our goals for PostGIS 1.4 (get it out the door, close the bugs and release a candidate) and PostGIS 2.0 (room for more types, strict support of ISO SQL/MM outputs, geodetic support, 3D objects, geometry typemods, and wktraster). We got a lot decided in a short period, and are ready to buckle down for some coding tomorrow in aid of getting 1.4 out the door.

Because I was huddled with PostGIS people, I didn’t get to participate in the Mapserver planning session. But over dinner I was told that XML map-file directions were discussed and decided on (a standard schema will be developed and XSLT transform from map->xml), performance issues were targeted (proj!), and a plan for one-pass querying (to speed up WFS mode) was settled on.

This evening, we watched the Hershey Bears best the Toronto Marlies of the AHL, featuring some fine offense (on the part of the Bears) and the requisite number of fights. Then we battled the weather (that cold, cold rain) back to dinner at East Sid Joe’s downtown. Thanks to our sponsors, qPublic.net and Rich Greenwood for supporting today’s activities.

Talkie Talkie

Everyone who does talks regularly has their own approach to building up the necessary content and flow. Dave Bouwman builds his talks up from post-it notes. A nice approach.

For brand new topics, I take a long walk, a couple hours or more around Beacon Hill Park, and roll ideas about – I carry a notepad so I can jot them down and then forget them again, keeping my brain loose. That first couple hours of walking hopefully yields the kernel of the talk and a a handful of interesting ideas – perhaps a dozen lines of text. Then I sit on the couch with a text editor and write the presentation like an essay. Outline in high level. Detail things, move those bits around, then actually write the talk, like an article, fully written out. When I get to around 5000 words I know I have about hour of material. I put ideas for graphics and slides in-lined in the text with «<»> characters. Once I’m happy with the story, I sit down in a separate session and build the actual slide deck. Google Images is a pretty useful source of pictures for almost any topic.

It’s an absurdly time-consuming process, but for keynotes and other instances where I’m monopolizing the attention of hundreds of people at a time, it’s a fair bargain. They are giving me their time en masse, they deserve a polished product – spend 5 minutes saying “um” or “ahhh” in front of 400 people, you’ve just wasted 33 hours of aggregate time.

FastCGI Hint

I’m preparing to benchmark / profile Mapserver and PostGIS for the upcoming code sprint in Toronto, and setting up Mapserver as a FastCGI is a requirement to get good profiling results. The JMeter bench marks run multiple threads of load, so having multiple Mapservers running makes things faster.

However, trying to get “FastCgiConfig” to dynamically spawn the required instances was a real pain. Setting the “updateInterval” nice and low made extra Mapserver processes come online a little faster, but in a kind of chunky way. It seemed to kill the existing process before flipping on the new ones. The config line looked like this:

FastCgiConfig -appConnTimeout 60 -idle-timeout 60 -init-start-delay 0 -minProcesses 2 -maxClassProcesses 6 -startDelay 5 -restart-delay 1 -killInterval 30 -singleThreshold 5 -updateInterval 1

In the end, I opted to just statically start the number of processes that made sense for my dual core system (4, in my estimation) using the “FastCgiServer” directive. The config line is a blissfully simple:

FastCgiServer /Users/pramsey/Sites/cgi-bin/mapserv.fcgi -processes 4

Throughput for simple tests (style-free roads from PostGIS, 4 threads of execution) is running as high as 48 maps per second.