Too Big to Explain

I spent the morning yesterday at an Oracle Technical Session, lots of government employees and contractors crammed into a ballroom listening to Oracle reps talk about the latest-and-greatest offerings from the beast.

The best part was, after a one hour presentation on “Oracle Fusion Middleware”, by quite a polished speaker, he asks for questions, and someone says:

“Thanks for your presentation, but, I still have no idea what Fusion Middleware does.”

Ouch.

Fair comment, too, the presentation was all market-speak, how data was “integrated”, decisions “made more quickly”, and so on. Clarity is not aided by the fact that “Oracle Fusion Middleware” is itself a suite of a dozen different bits.

To quote Oracle’s web site, a “portfolio of customer-proven software that spans from portals and process managers to application infrastructure, developer tools, and business intelligence”.

Some cone-head in the Oracle marketing department has decided that all these bits and pieces will be easier to sell if they are all wrapped under one product brand, “Fusion Middleware”. But really, pretending it is all one thing has made the product too big to explain.

What does it do? Everything. Nothing. It depends.

It brings to mind The Elephant and the Blind Men.

See!

After an enlightening start picking up C, I spent a fair bit of time in April working on the Mapserver code base. All my April work is now committed, so it will be available in the upcoming 5.2 release.

Large shapefile performance

This been a problem for as long as Mapserver has been around, but Mapserver has been so damn fast that for the most part the performance fall-off as files got larger was ignored (if you can render your map in 0.12s on a 2M record file, that’s still pretty acceptable).

However, during FOSS4G2007, Brock Anderson reported that Mapserver was actually several times slower than Geoserver for the particular use case of rendering a small map off a large file.

This could not be borne.

The problem turned out to be the way Mapserver handled the SHX file, loading it all into memory for each render. For a very large file, loading the whole SHX file just to pull less than 1% of the records out is a very bad performance bargain. So I re-wrote the SHX handling to lazily load just the bits of the SHX file needed for the features being rendered.

A secondary problem was that Mapserver kept the list of “features to draw” in a bitmap with as many entries as the shape file had records. Then it iterated through that list, at least twice for each render. Counting to several million twice when you only want a couple hundred features is a waste of time. Replacing the bitmap would have been a lot of work, so I replaced the iteration with one about 10 times faster.

The net result was a several-times improvement in speed for small maps rendered on big files. My reference render of 20 features from 1.8M went from a respectable 0.120s to a screaming 0.037s.

Tile-based map access

“How do I put my Mapserver layers into Google Maps?”

A fair question. Here’s this great mapping user interface, and this great map renderer, they should go together like chocolate and peanut butter. It’s possible to do with a relatively thin script on top of Mapserver, but requires some extra configuration steps.

This upgrade cuts the steps down to:

  • author map file; and
  • author Google Maps HTML page.

See the tile mode howto for some examples. It boils down to using the GTileLayer and setting the tileUrlTemplate to point at a tile-enabled Mapserver.

WMS client URL encoding

These were minor patches, but issues that had been bugging me for a while.

The WMS client URL encoding brings Mapserver intro strict compliance with the WMS specification and that will allow it to work with strict servers, of which the ER Mapper Image Server is one.

HTTP Cache-control headers

The HTTP patch allows the user to configure Mapserver to send a Cache-control: max-age=nnnn header with WMS responses. For clients like OpenLayers, that fetch images in a tiled manner, this should hopefully promote a more cache-friendly behavior, and faster performance.

I'd Like to Thank the Academy...

And my publicist and stylist, oh and Mom and Dad…

But mostly Howard Butler for nominating me and the rest of the Mapserver PSC for accepting me as a Mapserver committer. I guess my crazy ideas and cockeyed schemes didn’t scare them off!

You like me! You really like me!

My Trip to the Consulate

I took a day on Wednesday to travel to Vancouver and apply for a US passport. The US passport form is pretty straightforward, but the consulate experience is anything but.

Step 1: Getting an appointment. Last month, before my last trip to Vancouver, I thought I might combine the trip and take care of the passport application at the same time. No dice, appointments for passports are booked up a month in advance!

Step 2: Getting in. I arrive at the building, which has a security checkpoint on the ground floor (consulate is on floor 20):

  • He: “I am sorry sir, you cannot bring your laptop or cell phone into the consulate.”
  • Me: “Oh, OK, can I leave them with you?”
  • He: “No sir, you may not.”
  • Me: “Uh….”

Necessity is the mother of invention, so I run across the street and ask the counter-lady in a dime store to hold my laptop and phone. She graciously agrees. Now electronics-free, I return, and am allowed in.

Step 3: Going up. Me and a group of VISA applicants (pity the poor VISA applicants) wait for the secure elevator. The doors open, and there inside is a delivery guy with a palette of Dell computers! OK, we squeeze in, and the security guard swipes his card and presses the button for 20th, then gets out. We go up one floor. Someone gets on from the general building population! We go up to 17, and the Dell guy gets off with his computers. At 20, we get off, having traversed the world’s most porous security cordon. However, it does explain…

Step 4: Getting in, again. Despite having gone through a screening on the ground floor, you get screened once more on 20! No doubt because the ground floor screening simply lets you back out into the general building population.

Step 5: Waiting. Even though my appointment is for 10am, the more experienced people with me say that they have waited for as much as 2 hours in the past in order to be served. I am fortunate, and only wait 20 minutes.

Step 6: The envelope please. The staffer who takes my papers and walks through them is very helpful, but at the end he has a strange request. In order for me to get my passport, they have to mail it to me. However, they have cancelled their old courier contract. Would I mind going to the building across the street, buying an ExpressPost envelope and returning it to him, so he can mail the passport.

Step 7: Out, down, buy, back, in, up, in. Getting out and back in is faster now that I now the drill. Rather than asking me my business, the security guards just look at my purchase, nod sagely and say “Ahhh. Envelope.” I am joined on the elevator by two other applicants, envelopes in hand.

Step 8: Done. Back on the street, I put my belt back on, recover my electronics from the dime store and tip the nice lady, and head out.

There is a nice business opportunity available for anyone who wants to stand outside the US Consulate in Vancouver and run a phone check business for $1-per-phone. You could probably sell ExpressPost envelopes while you were at it.

Into the Clouds

One of my favorite software articles ever is Joel Spolsky’s “Law of Leaky Abstractions”, which is about the (unavoidable) dangers of building on software abstractions. Unavoidable, because the whole edifice of programming is built on layer upon layer of abstractions. Dangerous, because not having an understanding of what is happening below your working abstraction can lead to unintentionally terrible mistakes.

The release of Google’s App Engine and earlier releases of various components of Amazon Web Services (storage, queueing, database, computing) serve as a reminder that the process of adding abstraction has not come to a stop, but it has migrated for the moment to a new field. Instead of adding a programming layer, Google and Amazon have added a deployment layer of abstraction – you no longer need to know or care what machine your application is running on, or where that machine is.

As with other layers of abstraction, this new deployment abstraction will introduce new (yet to be discovered) programming pitfalls, but it will also liberate developers (and the businesses that hire them) to spend less time (and money) mucking with operating system set-up, database tuning, fail-over and replication systems, and other necessary details of server administration. The tasks involved in setting up a reliable server farm are both irrelevant to most aspects of application development and highly repetitive – ripe for being abstracted away, in other words.

As with previous abstractions (microcode, higher level languages, operating systems, object/relational mappings) the “platform as a service” (PaaS) abstraction removes a category of complication and replaces it with a new choice: what web service platform (abstraction) shall I use for my application?

Do I tie myself to Google? Amazon? Sun? Microsoft?

If all this sounds vaguely familiar, that’s because it is exactly the same decision process involved in choosing which implementation of a persistence abstraction (Oracle? MySQL? PostgreSQL?) or process management/filesystem abstraction (Linux? Solaris? Windows?) or O/R abstraction (Hibernate? JPOX?) you are going to use for your application.

And the same trade-offs apply. Do I like the implementation of this abstraction? Do I trust the vendor (to not screw me, to not go out of business)? Can I afford it?

If there is one thing missing from the PaaS tapestry so far (not counting Microsoft’s no-doubt-forthcoming entry to the field), it is a strong “open source” thread. Unlike open source software, open source PaaS can’t be replicated at zero cost (servers must be purchased, plugged in, cooled, etc) but PaaS can go “open sourceish” via: standard service APIs, allowing users to migrate easily from provider to provider; standardization on some open source components that fit the PaaS model (like Hadoop and Linux virtualization as already demonstrated by AWS).

Open source tends to be fast-follower, so I expect third-party deployable versions of the App Engine and AWS APIs will come soon enough. To me, the last couple years feel like 1995 all over again – just when you think you understand the structures of computing, the core premises are overthrown and everything is fresh again. In 1995 it was the internet and Linux shaking the foundations of the Windows hegemony; this time it is the cloud, wiping away the last vestiges of local computing context.