Wednesday, April 30, 2008

See!

After an enlightening start picking up C, I spent a fair bit of time in April working on the Mapserver code base. All my April work is now committed, so it will be available in the upcoming 5.2 release.

 Large shapefile performance

This been a problem for as long as Mapserver has been around, but Mapserver has been so damn fast that for the most part the performance fall-off as files got larger was ignored (if you can render your map in 0.12s on a 2M record file, that's still pretty acceptable).

However, during FOSS4G2007, Brock Anderson reported that Mapserver was actually several times slower than Geoserver for the particular use case of rendering a small map off a large file.

This could not be borne.

The problem turned out to be the way Mapserver handled the SHX file, loading it all into memory for each render. For a very large file, loading the whole SHX file just to pull less than 1% of the records out is a very bad performance bargain. So I re-wrote the SHX handling to lazily load just the bits of the SHX file needed for the features being rendered.

A secondary problem was that Mapserver kept the list of "features to draw" in a bitmap with as many entries as the shape file had records. Then it iterated through that list, at least twice for each render. Counting to several million twice when you only want a couple hundred features is a waste of time. Replacing the bitmap would have been a lot of work, so I replaced the iteration with one about 10 times faster.

The net result was a several-times improvement in speed for small maps rendered on big files. My reference render of 20 features from 1.8M went from a respectable 0.120s to a screaming 0.037s.

 Tile-based map access

"How do I put my Mapserver layers into Google Maps?"

A fair question. Here's this great mapping user interface, and this great map renderer, they should go together like chocolate and peanut butter. It's possible to do with a relatively thin script on top of Mapserver, but requires some extra configuration steps.

This upgrade cuts the steps down to:
  • author map file; and
  • author Google Maps HTML page.

See the tile mode howto for some examples. It boils down to using the GTileLayer and setting the tileUrlTemplate to point at a tile-enabled Mapserver.

 WMS client URL encoding

These were minor patches, but issues that had been bugging me for a while.

The WMS client URL encoding brings Mapserver intro strict compliance with the WMS specification and that will allow it to work with strict servers, of which the ER Mapper Image Server is one.

 HTTP Cache-control headers

The HTTP patch allows the user to configure Mapserver to send a Cache-control: max-age=nnnn header with WMS responses. For clients like OpenLayers, that fetch images in a tiled manner, this should hopefully promote a more cache-friendly behavior, and faster performance.
 

Sunday, April 27, 2008

I'd Like to Thank the Academy...

And my publicist and stylist, oh and Mom and Dad...

But mostly Howard Butler for nominating me and the rest of the Mapserver PSC for accepting me as a Mapserver committer. I guess my crazy ideas and cockeyed schemes didn't scare them off!

You like me! You really like me!

Friday, April 25, 2008

My Trip to the Consulate

I took a day on Wednesday to travel to Vancouver and apply for a US passport. The US passport form is pretty straightforward, but the consulate experience is anything but.

Step 1: Getting an appointment. Last month, before my last trip to Vancouver, I thought I might combine the trip and take care of the passport application at the same time. No dice, appointments for passports are booked up a month in advance!

Step 2: Getting in. I arrive at the building, which has a security checkpoint on the ground floor (consulate is on floor 20):
  • He: "I am sorry sir, you cannot bring your laptop or cell phone into the consulate."
  • Me: "Oh, OK, can I leave them with you?"
  • He: "No sir, you may not."
  • Me: "Uh...."
Necessity is the mother of invention, so I run across the street and ask the counter-lady in a dime store to hold my laptop and phone. She graciously agrees. Now electronics-free, I return, and am allowed in.

Step 3: Going up. Me and a group of VISA applicants (pity the poor VISA applicants) wait for the secure elevator. The doors open, and there inside is a delivery guy with a palette of Dell computers! OK, we squeeze in, and the security guard swipes his card and presses the button for 20th, then gets out. We go up one floor. Someone gets on from the general building population! We go up to 17, and the Dell guy gets off with his computers. At 20, we get off, having traversed the world's most porous security cordon. However, it does explain...

Step 4: Getting in, again. Despite having gone through a screening on the ground floor, you get screened once more on 20! No doubt because the ground floor screening simply lets you back out into the general building population.

Step 5: Waiting. Even though my appointment is for 10am, the more experienced people with me say that they have waited for as much as 2 hours in the past in order to be served. I am fortunate, and only wait 20 minutes.

Step 6: The envelope please. The staffer who takes my papers and walks through them is very helpful, but at the end he has a strange request. In order for me to get my passport, they have to mail it to me. However, they have cancelled their old courier contract. Would I mind going to the building across the street, buying an ExpressPost envelope and returning it to him, so he can mail the passport.

Step 7: Out, down, buy, back, in, up, in. Getting out and back in is faster now that I now the drill. Rather than asking me my business, the security guards just look at my purchase, nod sagely and say "Ahhh. Envelope." I am joined on the elevator by two other applicants, envelopes in hand.

Step 8: Done. Back on the street, I put my belt back on, recover my electronics from the dime store and tip the nice lady, and head out.

There is a nice business opportunity available for anyone who wants to stand outside the US Consulate in Vancouver and run a phone check business for $1-per-phone. You could probably sell ExpressPost envelopes while you were at it.
 

Monday, April 21, 2008

Into the Clouds

One of my favorite software articles ever is Joel Spolsky's "Law of Leaky Abstractions", which is about the (unavoidable) dangers of building on software abstractions. Unavoidable, because the whole edifice of programming is built on layer upon layer of abstractions. Dangerous, because not having an understanding of what is happening below your working abstraction can lead to unintentionally terrible mistakes.

The release of Google's App Engine and earlier releases of various components of Amazon Web Services (storage, queueing, database, computing) serve as a reminder that the process of adding abstraction has not come to a stop, but it has migrated for the moment to a new field. Instead of adding a programming layer, Google and Amazon have added a deployment layer of abstraction – you no longer need to know or care what machine your application is running on, or where that machine is.

As with other layers of abstraction, this new deployment abstraction will introduce new (yet to be discovered) programming pitfalls, but it will also liberate developers (and the businesses that hire them) to spend less time (and money) mucking with operating system set-up, database tuning, fail-over and replication systems, and other necessary details of server administration. The tasks involved in setting up a reliable server farm are both irrelevant to most aspects of application development and highly repetitive – ripe for being abstracted away, in other words.

As with previous abstractions (microcode, higher level languages, operating systems, object/relational mappings) the "platform as a service" (PaaS) abstraction removes a category of complication and replaces it with a new choice: what web service platform (abstraction) shall I use for my application?

Do I tie myself to Google? Amazon? Sun? Microsoft?

If all this sounds vaguely familiar, that's because it is exactly the same decision process involved in choosing which implementation of a persistence abstraction (Oracle? MySQL? PostgreSQL?) or process management/filesystem abstraction (Linux? Solaris? Windows?) or O/R abstraction (Hibernate? JPOX?) you are going to use for your application.

And the same trade-offs apply. Do I like the implementation of this abstraction? Do I trust the vendor (to not screw me, to not go out of business)? Can I afford it?

If there is one thing missing from the PaaS tapestry so far (not counting Microsoft's no-doubt-forthcoming entry to the field), it is a strong "open source" thread. Unlike open source software, open source PaaS can't be replicated at zero cost (servers must be purchased, plugged in, cooled, etc) but PaaS can go "open sourceish" via: standard service APIs, allowing users to migrate easily from provider to provider; standardization on some open source components that fit the PaaS model (like Hadoop and Linux virtualization as already demonstrated by AWS).

Open source tends to be fast-follower, so I expect third-party deployable versions of the App Engine and AWS APIs will come soon enough. To me, the last couple years feel like 1995 all over again – just when you think you understand the structures of computing, the core premises are overthrown and everything is fresh again. In 1995 it was the internet and Linux shaking the foundations of the Windows hegemony; this time it is the cloud, wiping away the last vestiges of local computing context.

Friday, April 18, 2008

Malware? Schmidt?

Very odd. This evening, I want to read Chris Schmidt's latest blog post, and what I saw was this:



What? Apparently Chris is distributing "badware". I'll be interested to see how this shakes up, if Chris' site got schmutzed or if the "anti-phishing site" Firefox is aligned with made a mistake.

What is very odd is that Firefox resolutely refuses to take me to Chris' site. Safari, on the other hand, cannot display anything at all from the site, which perhaps means "bad things afoot". Glad I am not Chris' sysadmin tonight (or Chris, assuming sysadmin == Chris).

Mapserver Debug Logging

Daniel Morissette spills the beans on the mapserver-users list:
IIRC, LOG only logs some info on the mapserv request status at the end of its execution. I don't use it much and don't know much about it.

To get debugging output, with MapServer 5.0+, set:

CONFIG "MS_ERRORFILE" "/var/tmp/ms.log"

... and then set DEBUG level (ON, or number between 1 and 5) at the top-level in the mapfile and in each layer for which you want debugging output.

More details are available in RFC-28: http://mapserver.gis.umn.edu/development/rfc/ms-rfc-28
If there is something definitively "bad" about modern Mapserver it is the migration of configuration directives into "magic string" blocks of the map file, which are much less well documented that the "official" elements of the file.

CONFIG, PROCESSING, METADATA, that's right, I'm looking at you.

Wednesday, April 09, 2008

Mapserver and Lat/Lon

One of the problems with open source is how much interesting stuff hides beneath the surface, only visible to those willing to read the source code... interesting features you do not even know are there!

On the bright side, you can find these Easter Eggs, if you look.

For example, today I found a case where Mapserver renders projected maps even when the extents you send in are in lon/lat!

My map file looks like this (note the output projection is defined as Mercator):
 MAP

SHAPEPATH "/Users/pramsey/Code/mapserver/msworldtest/"
IMAGETYPE GIF

PROJECTION
"proj=merc"
END

LAYER
NAME continent
PROJECTION
"init=epsg:4326"
END
TYPE POLYGON
DATA continent
STATUS DEFAULT
CLASS
OUTLINECOLOR 10 10 10
COLOR 200 200 200
END
END

END

My request URL looks like this (note the mapext coordinates are lon/lat):

http://localhost/cgi-bin/mapserv?map=~/Code/mapserver/msworldtest/reproj.map&mode=map&layers=continent&mapext=-90+45+0+80&imgsize=500+250

And the output looks like this:



So my request was in geographic coordinates, but my output was still in Mercator.

This is, of course, a brutal bug-in-waiting for someone with a projected coordinate system that happens to include valid requests in the range of (-180,-90 180,90). Mercator does, but a 180x180 meter patch of the Atlantic ocean will probable never be zoomed in on – if it is, the user will suddenly see the whole world, to their great surprise.

Tuesday, April 08, 2008

That's Billion with a "B"

This article on scaling PostgreSQL to support Skype's operations is well worth a read for anyone running a high-end PostgreSQL installation.
PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.

Snapping Points in PostGIS

Fun question on the #postgis IRC channel today, just hard enough to be interesting and just easy enough to not be overwhelming:
Given a table of points and a table of lines, snap all the points within 10 metres of the lines to the lines.
My first thought was "PostGIS doesn't have that snapping function", but it actually does, hidden in the linear-referencing functions: ST_Line_Locate_Point(line, point).

OK, that returns a measure along the line, but I want a point! No problem, ST_Line_Interpolate_Point(line, measure) returns a point from a measure.

Great, so now all I need are, for each point within 10 metres of the lines, the nearest line. Yuck, finding the minimum. However, with the PostgreSQL DISTINCT ON syntax and some ordering, it all pops out:
 SELECT 
DISTINCT ON (pt_id)
pt_id,
ln_id,
ST_AsText(
ST_line_interpolate_point(
ln_geom,
ST_line_locate_point(ln_geom, vgeom)
)
)
FROM
(
SELECT
ln.the_geom AS ln_geom,
pt.the_geom AS pt_geom,
ln.id AS ln_id,
pt.id AS pt_id,
ST_Distance(ln.the_geom, pt.the_geom) AS d
FROM
point_table pt,
line_table ln
WHERE
ST_DWithin(pt.the_geom, ln.the_geom, 10.0)
ORDER BY
pt_id,d
) AS subquery;

The sub-query finds all the points/line combinations that meet the 10 meter tolerance rule, and returns them in sorted order, by point id and distance. The outer query then strips off the first entry for each distinct point id and runs the LRS functions on it to derive the new snapped point.

Snapperiffic!

Thursday, April 03, 2008

See?

One of the things I wanted to do after moving on from Refractions was get back into technology in a "hands on" way again, and the place I most want to get my hands dirty is with PostGIS. It's all very nice to be a technology evangelist, but very frustrating to have to depend entirely on others to get things implemented. I have to be my own staff now, and that means if I want to play with the guts of PostGIS, I have to learn C.

So that's what I'm doing. I have my book. I work through exercises. I read the PostGIS code. It's a slow process, but rewarding as my understanding grows.

For those of you who, like me, have mostly worked in higher level languages, I want to share my C "wow" moment for the week. C has arrays. The syntax is the same as (surprise) all those other languages (Java, Perl, Javascript, PHP) that ape C syntax. Want to iterate through an array? No problem, very familiar, we print out the contents of our array:

for( i = 0; i < sizeof(array); i++ ) {
printf( "%d\n", array[i] );
}


Now, I knew C pointers were much less abstract than Java pointers, they actually point to memory addresses. Even so, there's knowing and then there is KNOWING. This routine, that also prints the contents of the array, blew my mind:

for( i = 0; i < sizeof(array); i++ ) {
printf( "%d\n", *(array + i) );
}


WTF!?!

First, it turns out that the value of the bare "array" variable is just a pointer to the front of the array (how efficient). But the icing on the cake is that you can do math on the pointers! I add 1 to the pointer, and now it's pointing at the next element, so when I dereference the pointer (with that *) out pops the next value!

All you CompSci majors can have a laugh at my expense ("technopeasant!"), but I'm self-taught, and I have been living in other people's (Perl, Java, PHP, Avenue (!!!), Javascript) interpreters for many years. This stuff is too cool.

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels