Snapping Points in PostGIS

Fun question on the #postgis IRC channel today, just hard enough to be interesting and just easy enough to not be overwhelming:

Given a table of points and a table of lines, snap all the points within 10 metres of the lines to the lines.

My first thought was “PostGIS doesn’t have that snapping function”, but it actually does, hidden in the linear-referencing functions: ST_Line_Locate_Point(line, point).

OK, that returns a measure along the line, but I want a point! No problem, ST_Line_Interpolate_Point(line, measure) returns a point from a measure.

Great, so now all I need are, for each point within 10 metres of the lines, the nearest line. Yuck, finding the minimum. However, with the PostgreSQL DISTINCT ON syntax and some ordering, it all pops out:

SELECT DISTINCT ON (pt_id)
    pt_id,
    ln_id, 
    ST_AsText(
        ST_line_interpolate_point(
            ln_geom, 
            ST_line_locate_point(ln_geom, vgeom)
        )
    ) 
FROM
(
    SELECT 
        ln.the_geom AS ln_geom,
        pt.the_geom AS pt_geom, 
        ln.id AS ln_id, 
        pt.id AS pt_id, 
        ST_Distance(ln.the_geom, pt.the_geom) AS d
    FROM 
        point_table pt, 
        line_table ln 
    WHERE 
        ST_DWithin(pt.the_geom, ln.the_geom, 10.0) 
    ORDER BY pt_id, d
) AS subquery;

The sub-query finds all the points/line combinations that meet the 10 meter tolerance rule, and returns them in sorted order, by point id and distance. The outer query then strips off the first entry for each distinct point id and runs the LRS functions on it to derive the new snapped point.

Snapperiffic!

See?

One of the things I wanted to do after moving on from Refractions was get back into technology in a “hands on” way again, and the place I most want to get my hands dirty is with PostGIS. It’s all very nice to be a technology evangelist, but very frustrating to have to depend entirely on others to get things implemented. I have to be my own staff now, and that means if I want to play with the guts of PostGIS, I have to learn C.

So that’s what I’m doing. I have my book. I work through exercises. I read the PostGIS code. It’s a slow process, but rewarding as my understanding grows.

For those of you who, like me, have mostly worked in higher level languages, I want to share my C “wow” moment for the week. C has arrays. The syntax is the same as (surprise) all those other languages (Java, Perl, Javascript, PHP) that ape C syntax. Want to iterate through an array? No problem, very familiar, we print out the contents of our array:

for( i = 0; i < sizeof(array); i++ ) {
   printf( "%d\n", array[i] );
}

Now, I knew C pointers were much less abstract than Java pointers, they actually point to memory addresses. Even so, there’s knowing and then there is KNOWING. This routine, that also prints the contents of the array, blew my mind:

for( i = 0; i < sizeof(array); i++ ) {
     printf( "%d\n", *(array + i) );
}

WTF!?!

First, it turns out that the value of the bare “array” variable is just a pointer to the front of the array (how efficient). But the icing on the cake is that you can do math on the pointers! I add 1 to the pointer, and now it’s pointing at the next element, so when I dereference the pointer (with that *) out pops the next value!

All you CompSci majors can have a laugh at my expense (“technopeasant!”), but I’m self-taught, and I have been living in other people’s (Perl, Java, PHP, Avenue (!!!), Javascript) interpreters for many years. This stuff is too cool.

As my junior high school buddy once said...

…you know something’s really funny when Coke sprays out of your nose.

I’m coming to it late, I know, but Eric Schmidt’s Serenity Prayer made me laugh out loud, disconcerting the other coffee shop patrons.

Bloviationism

From the New Republic:

Clintonism is a political strategy that assumes a skeptical public; Obamaism is a way of actualizing a latent ideological majority.

Nice to know that jargon cuts across professional lines. Now, back to that RESTful interface to DE9IM topological relations…

Benchmarks, Damn Benchmarks and Statistics

Seen on the #mapserver IRC channel:

“hey guys I have mapserver running fine on a vehicle tracking application, what I want to know is the requirements for mapserver. Let say 100 connections on the same time. I have 2 GB RAM , Dual Core 3GHz procesor, what do you say, Will it be enough?”

Well, enough for what? At least one variable is missing, and that’s the expected response time for each request. If I am allowed to take a week per response, I can really lower the hardware requirements!

Benchmarking an application is a tricky business, and there are lots of ways to quantify the robustness of an application. My favorite method is a holistic method that takes into account the fact that most of the time the load generators are human beings. This won’t work for pure “web services”, where the requests can be generated automatically by a wide number of different clients.

Step one is to generate a baseline of what human load looks like. Working through your test plan is one way to achieve this, though you might want to game out in your head what a “typical” session looks like rather than a “complete” session that hits every piece of functionality once. Call this your “human workload”.

  1. Empty your web server logs.
  2. Sit down and run the “human workload” yourself, at a reasonable speed. You know the application, so you probably click faster than an average user, no matter, it doesn’t hurt to bias a little in the fast direction. When you are done with your session note the elapsed time, this is your “human workload time”.
  3. Now, take your web server logs and run them back against the server using a tool like curl. This generates all the requests from your human session, but forces the server to execute them as fast as possible. When it finishes, note the elapsed time, this is your “cpu workload time”.
  4. Finally, divide your “human workload time” by your “cpu workload time”. The result is how many computers like the one you just ran your test on are needed to support each human. If the answer is 0.2, then you can support 5 humans on your test machine.

Obviously, this is a very simple test metric, but it has the advantage of extreme ease-of-application and a tight binding between what is being measured and what the real world will finally hit the application with.