FOSS4G Workshops in Demand

It is still early days for registration at FOSS4G, but one trend is showing up loud and clear — workshops are popular. Most of the registrants so far have chosen to attend the Monday hands-on workshops. Taking that trend forward, it means the popular workshops will fill up fast, so putting off registration is a good way to lose out on going to workshops.

If you want your pick of the workshop crop, register now! Space is literally limited, we only have 240 lab seats to work with.

Karma Refill

Some commenters have noted that I am turning into something of a negative nelly. So, time to fill up the karma gas tank and accentuate the positive!

What I like about ESRI:

  • Corporate environmentalism and the “big picture” corporate attitude it implies (there’s more to life than software).
  • The old school AML folks, and the kind of roll up your sleeves and make things work attitude they have. Nothing empowers scientists like flexible tools, and ESRI delivered.
  • ArcMap. Bar none the most powerful single bundle of editing, analytics, cartography, out there. No other single install puts so much stuff under your mouse in one go.

What I like about Oracle:

  • SQL Developer. The hard core swear by TOAD, but for me, SQL developer is just right.
  • OTN and the culture of free downloads for developers. Oracle knows you have to put the tools in front of the users if you want them to try them and recommend them. I recently downloaded both Oracle and DB2 to put up test servers, and the experience was day and night. I still have achy muscles from jumping over all the DB2 hurdles.
  • Seven-letter executable names. TNSLSNR, I love you!
  • Online documentation. Love them or hate them, you have no excuse to be ignorant of them. The docs are good, they are complete, and they are all there.
  • Buying open source companies. It gives a guy hope, you know?
  • Xaviar Lopez (Oracle Spatial product manager). Extremely gracious man, willing to put up with a lot of guff from open source folks (like, er, me) at FOSS4G06 and stay positive. Hope to seem him in ‘07.

Performance and Contains()

The reason I was thinking about performance improvements — and how billing by CPU usage provides vendors with no incentive to work on them — is because we have been thinking about a particular PostGIS use case recently.

Suppose you have a very large candidate table of smallish things, 10s of millions of them, and you want to find all of the smallish things that are contained by a largish polygon.

The spatial index will be very useful for quickly winnowing down the 10s of millions of things to the 10s of thousands of things that might fall inside the polygon. However, after that, you’re left testing each of the smallish things individually for containment. And a majority of the smallish things will be unambiguously inside the large polygon, not even close to the edge, so a great deal of computation will be wasted. The same issue adheres to Intersects() and inversely to Within().

The trick, clearly, is to provide some kind of short circuit, so that the “easy” cases can be trivially dealt with and only the boundary cases need a full test.

A nice approach for generally convex polygons would be a “maximum inscribed rectangle” (MIR) — any small thing whose MBR fits in the MIR is definitely contained. However, then you have to calculate a MIR, which is itself costly.

A variation on the MIR approach is just to superimpose the polygon on a grid and find all those squares that are fully contained in the polygon. Any smallish feature whose MBR is fully contained by “inside grid squares” is itself fully contained.

What it looks like we’ll do first is to speed up the general calculation of containment, by caching a topologized version of the large polygon. The topologized version will have an index on all the edge segments, for fast testing if a given candidate crosses the boundary, and an index for fast point-in-polygon testing. The idea is first you see if the candidate crosses the boundary, if it does not then it must be either fully inside or fully outside, so then use a point-in-polygon test on one of the end points to see if it is in or out.

All in all, it is a lot of complexity for what seems like a very common hard-to-index case: test a large number of candidates for containment.

Ow, That Stings!

There’s a certain backhanded snarkiness to the ESRI UC Q&A item on PostGIS:

Will ESRI support the PostGIS open-source spatial extension for PostgreSQL?

Yes, ESRI will provide our customers with the option of using either the ISO/OGC spatial type or the PostGIS spatial extension.

So they’ll support the standard type or that PostGIS thingymabob. Of course, PostGIS is also an ISO/OGC spatial type, but somehow that fact is a little submerged in the phraseology.

Perverse Incentives

I am sure others have beaten this horse before, but I just have to take a whack at it.

Oracle, ESRI and others license their server-side technology on the basis of dollars-per-processing-unit, usually in the form of Constant * NumberOfCores * Price. For example, the base price for Oracle Enterprise (which you need to do high-end processing, like, say, computing a buffer (snort)) is 0.5 * NumberOfCore * $40,000.

OK, time for the quiz-show section of our show: Let’s say you buy yourself some Oracle, and start using it. You find a particular use case that is slow, but important to you. You call up your Oracle representative, what will his answer be: (a) we’ll make it faster or (b) you need more processing power. Remember, this is not a trick question, and he does earn commissions on sales.

Take it up a level. From a strategic financial point of view, all R&D dollars spent on on performance improvements actually constitute a double cost: the cost of doing the development; and, the cost of lost revenue due to fewer upgrades. If I am the CEO, do I encourage my managers to spend money on performance tweaks, which will reduce upgrade revenue, or on new features, which will drive new sales?