That's Billion with a "B"

This article on scaling PostgreSQL to support Skype’s operations is well worth a read for anyone running a high-end PostgreSQL installation.

PostgreSQL is used “as the main DB for most of [Skype’s] business needs.” Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.

Snapping Points in PostGIS

Fun question on the #postgis IRC channel today, just hard enough to be interesting and just easy enough to not be overwhelming:

Given a table of points and a table of lines, snap all the points within 10 metres of the lines to the lines.

My first thought was “PostGIS doesn’t have that snapping function”, but it actually does, hidden in the linear-referencing functions: ST_Line_Locate_Point(line, point).

OK, that returns a measure along the line, but I want a point! No problem, ST_Line_Interpolate_Point(line, measure) returns a point from a measure.

Great, so now all I need are, for each point within 10 metres of the lines, the nearest line. Yuck, finding the minimum. However, with the PostgreSQL DISTINCT ON syntax and some ordering, it all pops out:

            ST_line_locate_point(ln_geom, vgeom)
        ln.the_geom AS ln_geom,
        pt.the_geom AS pt_geom, AS ln_id, AS pt_id, 
        ST_Distance(ln.the_geom, pt.the_geom) AS d
        point_table pt, 
        line_table ln 
        ST_DWithin(pt.the_geom, ln.the_geom, 10.0) 
    ORDER BY pt_id, d
) AS subquery;

The sub-query finds all the points/line combinations that meet the 10 meter tolerance rule, and returns them in sorted order, by point id and distance. The outer query then strips off the first entry for each distinct point id and runs the LRS functions on it to derive the new snapped point.



One of the things I wanted to do after moving on from Refractions was get back into technology in a “hands on” way again, and the place I most want to get my hands dirty is with PostGIS. It’s all very nice to be a technology evangelist, but very frustrating to have to depend entirely on others to get things implemented. I have to be my own staff now, and that means if I want to play with the guts of PostGIS, I have to learn C.

So that’s what I’m doing. I have my book. I work through exercises. I read the PostGIS code. It’s a slow process, but rewarding as my understanding grows.

For those of you who, like me, have mostly worked in higher level languages, I want to share my C “wow” moment for the week. C has arrays. The syntax is the same as (surprise) all those other languages (Java, Perl, Javascript, PHP) that ape C syntax. Want to iterate through an array? No problem, very familiar, we print out the contents of our array:

for( i = 0; i < sizeof(array); i++ ) {
   printf( "%d\n", array[i] );

Now, I knew C pointers were much less abstract than Java pointers, they actually point to memory addresses. Even so, there’s knowing and then there is KNOWING. This routine, that also prints the contents of the array, blew my mind:

for( i = 0; i < sizeof(array); i++ ) {
     printf( "%d\n", *(array + i) );


First, it turns out that the value of the bare “array” variable is just a pointer to the front of the array (how efficient). But the icing on the cake is that you can do math on the pointers! I add 1 to the pointer, and now it’s pointing at the next element, so when I dereference the pointer (with that *) out pops the next value!

All you CompSci majors can have a laugh at my expense (“technopeasant!”), but I’m self-taught, and I have been living in other people’s (Perl, Java, PHP, Avenue (!!!), Javascript) interpreters for many years. This stuff is too cool.

As my junior high school buddy once said...

…you know something’s really funny when Coke sprays out of your nose.

I’m coming to it late, I know, but Eric Schmidt’s Serenity Prayer made me laugh out loud, disconcerting the other coffee shop patrons.


From the New Republic:

Clintonism is a political strategy that assumes a skeptical public; Obamaism is a way of actualizing a latent ideological majority.

Nice to know that jargon cuts across professional lines. Now, back to that RESTful interface to DE9IM topological relations…