Le géospatial : Un microcosme du code source libre
16 Mar 2009
Moi, en francais.
11 Mar 2009
Update. The pictures from the event are being uploaded.
So, this week was my first code sprint, and I feel like I learned a few things, which can hopefully be applied to future events. (Theoretically, I organized and participated in the 2007 FOSS4G event, but in reality I was too burned out to do more than watch people work with a dizzy smile on my face.)
Length. I think we accidentally hit on the perfect length for a sprint. Four days. Day one was “talking day”, the teams had a year of piled up discussions and decisions to make, and relatively little code was cut, but group cohesion and unity of purpose was achieved. Day two was “start coding” day, as people bit off bugs and features and so on. Day three was “wrap up coding” day, as larger features and bugs were completed. And day four was “polish and depart” day, as many people did a bit of work and then left for flights mid-day.
A shorter event would compress away coding time, not leaving enough duration to attempt larger features or bugs. A longer event would burn people out and by day four they would be slogging. Four days is just right for a self-contained event. For a sprint tacked onto the back of a larger event, that would have to be re-evaluated, as presumably people would already be pretty tired.
Venue. I think our choice of a travel hub in the off season worked out well. Most people had fairly direct travel options, and we were able to get very good hotel rates for a large city. Arranging block hotel rooms to keep everyone together was also good, as it made social organizing very simple (“meet in the lobby!”).
Using the hotel meeting facilities was less optimal. I would really like a room with windows next time, it would make the atmosphere much lighter. And banquet catering is extortionately expensive, to the extent we didn’t use it. If you don’t like the meeting room, and you’re not using the services, why use a hotel venue at all? So next time, I will spend a little longer looking for an alternate venue – our needs are simple, 1000sqft, 25 chairs, tables, and internet connectivity. It’s a code sprint, not a wedding.
Social. Circumstances (our inability to afford hotel catering) actually forced us into an enjoyable alternate plan, of having people fend for themselves during the day, and then all eat together in the evenings at sponsored dinners. We were also able to use sponsorship money for an evening of minor hockey, which was a fun cultural event. It turned out well, our money went further, the beer flowed freely and we had more non-coding time to interact than if everyone split up for dinner in the evenings. Another cultural event would be a good addition next year.
Effectiveness. One of the interesting side effects of our community overlaps (PostGIS developers are Mapserver developers are GDAL developers are libLAS developers) is that on “talking day” people had to make some hard choices about how to self-identify. Because I was concentrating on PostGIS, I mostly missed out on the Mapserver sprint, which is too bad.
Otherwise, as I mentioned in the day four summary, it was incredibly effective. It takes a long time to reach a decision over e-mail compared to face-to-face. That can be benefit (more time to consider options without getting stampeded into a decision) but also a drawback (more time taken, period).
Effort. Not hard at all, a very easy event to organize, with relatively few moving parts, and lots of help from locals like Tom Kralidis and Jeff McKenna in organizing. I’d do it again. In fact, I will do it again, going to start planning for the 2010 event starting this fall.
10 Mar 2009
Today we wrapped up the first annual code sprint for the “C tribe” of the open source geospatial community.
The final results in the code were great. We got some major performance improvements into Mapserver (shape file speed, projection lookup speed, query speed); we got PostGIS 1.4 ready for release; GDAL performance was investigated and work started on big speed-ups.
The final results for the community were also great. With everyone around the table, the PostGIS community agreed on development priorities for the 2.0 cycle. Mapserver got in good long discussions about long-range plans such as XML mapfile, and OWS services access control. I didn’t see the results, but the libLAS developers also had lots of time coordinating.
The face-to-face communications bandwidth is so much higher that problems fall by the wayside at a great rate. I saw Frank Warmerdam reviewing MrSid code in GDAL with Michael Gerlek, and Michael being able to quickly point out performance mistakes: “don’t call that function every time”, “these functions are costly, but only the first time through”. I saw Steve Lime mentally model several approaches to query improvement, trying them out on different community members before settling on the final solution (which I implemented in the PostGIS driver this morning, and has improved performance in WFS-on-PostGIS by a factor of 24 (!!!)).
Thanks to the GDAL project for sponsoring today’s activities.
And thanks to all the attendees, who paid their way to Toronto, and their accommodation, for the privilege of spending four days in a hotel conference room hacking on open source. You are wonderful, weird, wonderful people!
10 Mar 2009
It was “it all comes together” day today at the Toronto Code Sprint.
On the PostGIS front, the open bug list for our 1.4 pre-release shrunk down to the vanishing point. I closed off one [1], Mark Cave-Ayland closed off two [2, 3] and Olivier Courtin closed off one [4] and started on GML support for CURVE types. We should be able to kick out a testing version of 1.4 tomorrow no problem.
On the Mapserver front, our goal of getting faster and faster and making Andrea Aime eat his liver is well in sight. Frank Warmerdam skipped our foray to the Best Beer Palace in Toronto last night, went home instead, and added a memory cache to Proj4, which I tested this morning. The cache has removed the EPSG code look-up overhead from Mapserver, which was far-and-away the largest performance overhead in all modes.
Also today, Julien-Samuel Lacroix from MapGears completed his changes to Mapserver shape file access, which knocked a further 20% off the response time for the case of rendering from large shape files. We also identified a performance hit in outlined label drawing in GD, that he had a patch for several years ago, but unfortunately it requires patching GD also, so it might take a while to work down into general Mapserver usage.
Through profiling of image access in GDAL/Mapserver, Frank has been galvanized to re-work image access in Mapserver to no longer pull data from images one band at a time. For formats like MrSID and ECW, this alone should improve performance by a factor of three.
Meanwhile, Steve Lime has finalized his plans to make querying performance in Mapserver work faster, and written an RFC on his approach. I hope to alter the PostGIS driver to support the “new way of being” tomorrow.
Finally, I profiled hex-vs-base64 encoding options in the Mapserver PostGIS driver and decided on using hex, based on much faster performance. That change was committed today as well. All in all a good day for the “need for speed” Code Sprint.
After a long day of coding (I had to break up an intense discussion of handing Open Web Services configuration in Mapserver to get people to head to dinner) we headed out for our final group meal at Jack Astor’s on Front Street, where a good time was had by all. Tomorrow will be less intense as folks head off for flights at varying parts of the mid-day. I head home late in the evening, chasing the sun back to Victoria on the sole daily non-stop YYZ->YYJ flight.
Thanks to OSGIS.NL and LizardTech for supporting today’s activities.
09 Mar 2009
Another productive day at the Toronto Code Sprint! Today was “sprint forward” day, so everyone got an hour less sleep than they expected, but by 10am everyone was back in the room, coding hard.
I’m proud to have closed two bugs [1] [2] today (both of the “one hour of study per one line of altered code” sort), moving towards our PostGIS goal of releasing what Mark Cave-Ayland has been eloquently calling a 1.4 “beater” for the PostGIS community to test prior to cutting an official numbered 1.4 release candidate. I think we’ll have all the critical bugs closed before the end of the week and have that beater out on the road, mowing down pedestrians, before the week is out.
Towards the end of the afternoon, I fired up the Shark and provided some profiling information for the Mapserver team, and talked about some profiles that Julien from MapGears had generated earlier. Julien is going to enhance some work I did last year on large shape files, to make traversing the shape file selection set even faster (the larger the shape file, the higher the benefit, but by traversing the selection set with 4-byte (int) pointer instead of a 1-byte (char) pointer he should be able to extract some good gains).
Steve Lime continues to contemplate the problem of improving the MapServer query work flow, to make WFS and large query performance on database back-ends faster. It sounds like we are rapidly converging on an answer, so hopefully tomorrow is the day we do an implementation and tie some back-ends into the solution.
At 6:30, we all trouped to the Baton Rouge for dinner, which was excellent. Thanks to Coordinate Solutions and SJ Geophysics for supporting today’s activities!