Into the Clouds

One of my favorite software articles ever is Joel Spolsky’s “Law of Leaky Abstractions”, which is about the (unavoidable) dangers of building on software abstractions. Unavoidable, because the whole edifice of programming is built on layer upon layer of abstractions. Dangerous, because not having an understanding of what is happening below your working abstraction can lead to unintentionally terrible mistakes.

The release of Google’s App Engine and earlier releases of various components of Amazon Web Services (storage, queueing, database, computing) serve as a reminder that the process of adding abstraction has not come to a stop, but it has migrated for the moment to a new field. Instead of adding a programming layer, Google and Amazon have added a deployment layer of abstraction – you no longer need to know or care what machine your application is running on, or where that machine is.

As with other layers of abstraction, this new deployment abstraction will introduce new (yet to be discovered) programming pitfalls, but it will also liberate developers (and the businesses that hire them) to spend less time (and money) mucking with operating system set-up, database tuning, fail-over and replication systems, and other necessary details of server administration. The tasks involved in setting up a reliable server farm are both irrelevant to most aspects of application development and highly repetitive – ripe for being abstracted away, in other words.

As with previous abstractions (microcode, higher level languages, operating systems, object/relational mappings) the “platform as a service” (PaaS) abstraction removes a category of complication and replaces it with a new choice: what web service platform (abstraction) shall I use for my application?

Do I tie myself to Google? Amazon? Sun? Microsoft?

If all this sounds vaguely familiar, that’s because it is exactly the same decision process involved in choosing which implementation of a persistence abstraction (Oracle? MySQL? PostgreSQL?) or process management/filesystem abstraction (Linux? Solaris? Windows?) or O/R abstraction (Hibernate? JPOX?) you are going to use for your application.

And the same trade-offs apply. Do I like the implementation of this abstraction? Do I trust the vendor (to not screw me, to not go out of business)? Can I afford it?

If there is one thing missing from the PaaS tapestry so far (not counting Microsoft’s no-doubt-forthcoming entry to the field), it is a strong “open source” thread. Unlike open source software, open source PaaS can’t be replicated at zero cost (servers must be purchased, plugged in, cooled, etc) but PaaS can go “open sourceish” via: standard service APIs, allowing users to migrate easily from provider to provider; standardization on some open source components that fit the PaaS model (like Hadoop and Linux virtualization as already demonstrated by AWS).

Open source tends to be fast-follower, so I expect third-party deployable versions of the App Engine and AWS APIs will come soon enough. To me, the last couple years feel like 1995 all over again – just when you think you understand the structures of computing, the core premises are overthrown and everything is fresh again. In 1995 it was the internet and Linux shaking the foundations of the Windows hegemony; this time it is the cloud, wiping away the last vestiges of local computing context.

Malware? Schmidt?

Very odd. This evening, I want to read Chris Schmidt’s latest blog post, and what I saw was this:

What? Apparently Chris is distributing “badware”. I’ll be interested to see how this shakes up, if Chris’ site got schmutzed or if the “anti-phishing site” Firefox is aligned with made a mistake.

What is very odd is that Firefox resolutely refuses to take me to Chris’ site. Safari, on the other hand, cannot display anything at all from the site, which perhaps means “bad things afoot”. Glad I am not Chris’ sysadmin tonight (or Chris, assuming sysadmin == Chris).

Mapserver Debug Logging

Daniel Morissette spills the beans on the mapserver-users list:

IIRC, LOG only logs some info on the mapserv request status at the end of its execution. I don’t use it much and don’t know much about it.

To get debugging output, with MapServer 5.0+, set:

CONFIG “MS_ERRORFILE” “/var/tmp/ms.log”

… and then set DEBUG level (ON, or number between 1 and 5) at the top-level in the mapfile and in each layer for which you want debugging output.

More details are available in RFC-28:

If there is something definitively “bad” about modern Mapserver it is the migration of configuration directives into “magic string” blocks of the map file, which are much less well documented that the “official” elements of the file.

CONFIG, PROCESSING, METADATA, that’s right, I’m looking at you.

Mapserver and Lat/Lon

One of the problems with open source is how much interesting stuff hides beneath the surface, only visible to those willing to read the source code… interesting features you do not even know are there!

On the bright side, you can find these Easter Eggs, if you look.

For example, today I found a case where Mapserver renders projected maps even when the extents you send in are in lon/lat!

My map file looks like this (note the output projection is defined as Mercator):

  SHAPEPATH "/Users/pramsey/Code/mapserver/msworldtest/"
    NAME continent
    DATA continent
      OUTLINECOLOR 10 10 10
      COLOR 200 200 200

My request URL looks like this (note the mapext coordinates are lon/lat):


And the output looks like this:

So my request was in geographic coordinates, but my output was still in Mercator.

This is, of course, a brutal bug-in-waiting for someone with a projected coordinate system that happens to include valid requests in the range of (-180,-90 180,90). Mercator does, but a 180x180 meter patch of the Atlantic ocean will probable never be zoomed in on – if it is, the user will suddenly see the whole world, to their great surprise.

That's Billion with a "B"

This article on scaling PostgreSQL to support Skype’s operations is well worth a read for anyone running a high-end PostgreSQL installation.

PostgreSQL is used “as the main DB for most of [Skype’s] business needs.” Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.