Performance and Contains()

The reason I was thinking about performance improvements — and how billing by CPU usage provides vendors with no incentive to work on them — is because we have been thinking about a particular PostGIS use case recently.

Suppose you have a very large candidate table of smallish things, 10s of millions of them, and you want to find all of the smallish things that are contained by a largish polygon.

The spatial index will be very useful for quickly winnowing down the 10s of millions of things to the 10s of thousands of things that might fall inside the polygon. However, after that, you’re left testing each of the smallish things individually for containment. And a majority of the smallish things will be unambiguously inside the large polygon, not even close to the edge, so a great deal of computation will be wasted. The same issue adheres to Intersects() and inversely to Within().

The trick, clearly, is to provide some kind of short circuit, so that the “easy” cases can be trivially dealt with and only the boundary cases need a full test.

A nice approach for generally convex polygons would be a “maximum inscribed rectangle” (MIR) — any small thing whose MBR fits in the MIR is definitely contained. However, then you have to calculate a MIR, which is itself costly.

A variation on the MIR approach is just to superimpose the polygon on a grid and find all those squares that are fully contained in the polygon. Any smallish feature whose MBR is fully contained by “inside grid squares” is itself fully contained.

What it looks like we’ll do first is to speed up the general calculation of containment, by caching a topologized version of the large polygon. The topologized version will have an index on all the edge segments, for fast testing if a given candidate crosses the boundary, and an index for fast point-in-polygon testing. The idea is first you see if the candidate crosses the boundary, if it does not then it must be either fully inside or fully outside, so then use a point-in-polygon test on one of the end points to see if it is in or out.

All in all, it is a lot of complexity for what seems like a very common hard-to-index case: test a large number of candidates for containment.

Ow, That Stings!

There’s a certain backhanded snarkiness to the ESRI UC Q&A item on PostGIS:

Will ESRI support the PostGIS open-source spatial extension for PostgreSQL?

Yes, ESRI will provide our customers with the option of using either the ISO/OGC spatial type or the PostGIS spatial extension.

So they’ll support the standard type or that PostGIS thingymabob. Of course, PostGIS is also an ISO/OGC spatial type, but somehow that fact is a little submerged in the phraseology.

Perverse Incentives

I am sure others have beaten this horse before, but I just have to take a whack at it.

Oracle, ESRI and others license their server-side technology on the basis of dollars-per-processing-unit, usually in the form of Constant * NumberOfCores * Price. For example, the base price for Oracle Enterprise (which you need to do high-end processing, like, say, computing a buffer (snort)) is 0.5 * NumberOfCore * $40,000.

OK, time for the quiz-show section of our show: Let’s say you buy yourself some Oracle, and start using it. You find a particular use case that is slow, but important to you. You call up your Oracle representative, what will his answer be: (a) we’ll make it faster or (b) you need more processing power. Remember, this is not a trick question, and he does earn commissions on sales.

Take it up a level. From a strategic financial point of view, all R&D dollars spent on on performance improvements actually constitute a double cost: the cost of doing the development; and, the cost of lost revenue due to fewer upgrades. If I am the CEO, do I encourage my managers to spend money on performance tweaks, which will reduce upgrade revenue, or on new features, which will drive new sales?

Safari: SyntaxError - Parse error

I have now worked my way past this particularly opaque Safari error message twice, which is one time too many, so I am putting the explanation for my error case online. Safari seems to have a few cases where it throws up this utterly useless error message. It does include a line number reference, and double-clicking the error line in the log will take you to the offending line, but heaven help you if it is not clear what is wrong at that point.

I found another reference to this problem on the web, but it wasn’t the problem I had.

My problem was that I used “abstract” as a variable name. Because I am working on web pages for viewing presentation abstracts, this seemed natural, however I guess “abstract” is a reserved word of some kind in Safari’s Javascript implementation. Firefox showed no errors and happily ran code that Safari could not even parse.

Cognitive Dissonance

Try to hold these two concepts in my head at once… Ed Parsons, at the Open Street Map conference… Ed Parsons, at the Open Street Map conference… Ed Parsons, at the Open Street Map conference…

Just as in Hollywood, casting against type is sure to bring in the punters.

Mud

Ed Parsons, former CTO of the Ordnance Survey (organization dedicated to the monetization of proprietary data), current geospatial evangelist for Google (organization dedicated to the monetization of proprietary data), speaking to Open Street Map (organization dedicated to the free public domain data). Will there be a mud wrestling pit too?