I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I've learned over the past six months about using PostgreSQL and PostGIS together to make "magic" (any sufficiently advanced technology...)
Somehow I've gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague's blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there's always something new to learn.
Suppose you have a GPS location table:
You can get a correct set of lines from this collection of points with just this SQL:
SELECT gps_track_id, ST_MakeLine(geom ORDER BY gps_time ASC) AS geom FROM gps_poinst GROUP BY gps_track_id
Those of you who already knew about placing
ORDER BY within an aggregate function are going "duh", and the rest of you are, like me, going "whaaaaaa?"
Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.
Hot off their success transforming the BC social services sector with "integrated case management", Deloitte is now heavily staffing the upcoming transformation of the IT systems that underpin our natural resource management ministries.
Interlude: I should briefly note here that Deloitte's work in social services involved building a $180,000,000 case management system that the people who use it generally do not like, using software that nobody else uses for social services, that went offline for several consecutive days last year, and based on software that basically entered end-of-life almost five years ago. I'm sure that's not Deloitte's fault, they are only the international experts hired to advise on the best ways to build the system and then actually build it.
So many shiny arrows!
Smells like management consultants...
The brain trust has now decided that the thing we need on the land base is "integrated decision making", presumably because everything tastes better "integrated". A UVic MPA student has done a complete write-up of the scheme—and I challenge you to find the hard centre inside this chewey mess of an idea—but here's a representative sample:
The IDM initiative is an example of horizontal management because it is an initiative among non‐hierarchical ministries focused on gaining efficiencies by harmonizing regulations, IT systems and business processes for the betterment of the NRS as a whole. Horizontal management is premised on joint or consensual decision making rather than a more traditional vertical hierarchy. Horizontal collaborations create links and share information, goodwill, resources, and power or capabilities by organizations in two or more sectors to achieve jointly what they cannot achieve individually.
Sounds great, right!?! Just the sort of thing I'd choose to manage billions of dollars in natural resources! (I jest.)
Of course, the brain trust really isn't all that interested in "horizontal management", what has them hot and bothered about "integrated decision making" is that it's an opportunity to spend money on "IT systems and business processes". Yay!
To that end, they carefully prepared a business case for Treasury Board, asking for well north of $100M to rewrite every land management system in government. Forests, lands, oil and gas, heritage, the whole kit and caboodle. The business case says:
IDM will improve the ability of the six ministries and many agencies in the NRS to work together to provide seamless, high‐quality service to proponents and the public, to provide effective resource stewardship across the province, to effectively consult with First Nations in natural resource decisions, and to contribute to cross‐government priorities.
Sounds ambitious! I wonder how they're going to accomplish this feat of re-engineering? Well, I'm going to keep on wondering, because they redacted everything in the business case except the glowing hyperbole.
However, even though we don't know how, or really why, they are embarking on this grand adventure, we can rest assured that they are now spending money at a rate of about $10M / year making it happen, much of it on our good friends Deloitte.
Not that Secretariat...
The FOI system charged me $120 (and only after I bargained down my request to a much less informative one) to find the above out, because they felt that the information did not meet the test of being "of public interest". If you feel it actually is in the public interest to learn where our $100M on IT services for natural resources are being spent, and you live in BC, please leave me a comment on this post.
Interlude: The test for whether fees should be waived is double barrelled, but is (hilariously) decided by the public body itself (soooo unbiased). Here are the tests I think I pass (but they don't):
I'm still digging for more information (like, how is it that Deloitte can bill out 34 staff on this project when there hasn't been a major RFP for it yet?) so stay tuned and send me any hints if you have them.
"What's the best image format for map serving?" they ask me, shortly after I tell them not to serve their images from inside a database.
"Is it MrSid? Or ECW? those are nice and small." Which indeed they are. Unfortunately, outside of proprietary image server software I've never seen them be fast and nice and small at the same time. Generally the decode step is incredibly CPU intensive, presumably because of the fancy wavelet math that makes them so small in the first place.
"So, what's the best image format for map serving?".
In my experience, the best format for image serving, using open source rendering engines (MapServer, GeoServer, Mapnik) is: GeoTIFF, with JPEG compression, internally tiled, in the YCBCR color space, with internal overviews. Unfortunately, GeoTiffs are almost never delivered this way, as I was reminded today while downloading a sample image from the City of Kamloops (But nonetheless, thanks for the great free imagery, Kamloops!)
It came in a 593Mb ZIP file. "Hm, that's pretty big, I thought." I unzipped it.
Unzipped it was a 515Mb TIF file. That's right, it was smaller "uncompressed". Why? Because internally it was already compressed, and applying the ZIP compression algorithm to already compressed data generally fluffs it up a little. Whoops.
The default TIFF compression is, unfortunately, "deflate", the same as that used for ZIP. This is a lossless encoding, but not very good for imagery. We can make the image a whole lot smaller just by using a more appropriate compression, like JPEG. We'll also tile it internally while we're at it. Internal tiling allows renderers to quickly pick out and decompress just a small portion of the image, which is important once you've applied a more serious compression algorithm like JPEG.
gdal_translate \ -co COMPRESS=JPEG \ -co TILED=YES \ 5255C.tif 5255C_JPEG.tif
This is much better, now we have a vastly smaller file.
But we can still do better! For reasons that well pass my understanding, the JPEG algorithm is more effective against images that are stored in the YCBCR color space. Mine is not to reason why, though.
gdal_translate \ -co COMPRESS=JPEG \ -co PHOTOMETRIC=YCBCR \ -co TILED=YES \ 5255C.tif 5255C_JPEG_YCBCR.tif
Wow, now we're down to 1/20 the size of the original.
But, we've applied a "lossy" algorithm, JPEG, maybe we've ruined the data! Let's have a look.
Can you see the difference? Me neither. Using a JPEG "quality" level of 75%, there are no visible artefacts. In general, JPEG is very good at compressing things so humans "can't see" the lost information. I'd never use it for compressing a DEM or a data raster, but for a visual image, I use JPEG with impunity, and with much lower quality settings too (for more space saved).
Finally, for high speed serving at more zoomed out scales, we need to add overviews to the image. We'll make sure the overviews use the same, high compression options as the base data.
gdaladdo \ --config COMPRESS_OVERVIEW JPEG \ --config PHOTOMETRIC_OVERVIEW YCBCR \ --config INTERLEAVE_OVERVIEW PIXEL \ -r average \ 5255C_JPEG_YCBCR.tif \ 2 4 8 16
For reasons passing understanding,
gdaladdo uses a different set of command-line switches to pass the configuration info to the compressor than
gdal_translate does, but as before, mine is not to reason why.
The final size, now with overviews as well as the original data, is still less that 1/10 the size of the original.
So, to sum up, your best format for image serving is:
Go forth and compress!
Like doing a sudoku, solving a "simple yet tricky" problem in spatial SQL can grab ones mind and hold it for a period. Someone on the PostGIS IRC channel was trying to "convert a linestring into a set of two-point segments", using an external C++ program, and I thought: "hm, I'm sure that's doable in SQL".
And sure enough, it is, though the syntax for referencing out the parts of the dump objects makes it look a little ugly.
I started with Boundless back in 2009 when it was OpenGeo and still a part of the Open Planning Project, a weird non-profit arm of a New York hedge fund millionaire's corporate archipelago. (The hedgie, Mark Gorton, is still going strong, despite the brief set-back he endured when LimeWire was sued by RIAA.) For that six year run, I was fortunate to have a lead role in articulating what it meant to "do open source" in the geospatial world, and to help to build OpenGeo into a self-supporting open source enterprise. We grew, spun out of the non-profit, gained lots of institutional customers, and I got to meet and work with lots of quality folks. After six years though, I feel like I need a change, an opportunity to learn some new things and meet some new people: To move from the enterprise space, to the consumer space.
PostGIS, to run their platform, and working for CartoDB gives me a chance to talk about and to work on something I like almost as much as (more than?) open source: spatial SQL! The team at CartoDB have done a great job with their platform, providing a simple entry-point into map making, while still leaving the power of SQL exposed and available, so that users can transition from beginner, to explorer, to power user. As someone who currently only knows a portion of their technology (the SQL bit), I'm looking forward to experiencing the rest of their platform as a beginner. I also know the platform folks will have lots of good questions for me on PostGIS internals, and we'll have many interesting conversations about how to keep pushing PostgreSQL and PostGIS to the limits.
My two week between-jobs break was refreshing, but sometimes a change is as good as rest too. I enjoyed my last six years with Boundless and I'm looking forward to the future with CartoDB.
To shift the software procurement paradigm, GSA’s 18F Team and the Office of Integrated Technology Services (ITS) is collaborating on the establishment of a BPA that will feature vendors who specialize in Agile Delivery Services. The goal of the proposed BPA is to decrease software acquisition cycles to less than four weeks (from solicitation to contract) and expedite the delivery of a minimum viable product (MVP) within three months or less.
In a wonderful "eat your own dogfood" move, the team working on building this new procurement vehicle are themselves adopting agile practices in their own process. Starting small with a pilot, working directly with the vendors who will be trying the new vehicle, etc. If the hidebound old GSA can develop a workable framework for agile procurement, then nobody else has an excuse.
(The reason procurement agencies have found it hard to specify "agile" is that agile by design does not define precise deliverables in advance, so it is damnably hard to fit into a "fixed cost bid" structure. In places where time-and-materials vehicles are already in place, lots of government organizations are already working with vendors in an agile way, but for the kinds of big, boondoggle-prone capital investment projects I write about, the waterfall model still predominates.)
…the client side will eat more of the server side stack.
To understand what "more" there is left to eat, it's worth enumerating what's already been eaten (or, which is being consumed right now, as we watch):
Uh, what's left?
Have I missed any?
I haven't had to build CUnit myself for a while, because most of the systems I work with have it in their packaged software repositories, but for Solaris it's not there, and it turns out, it's quite painful to build!
Here's the commands I finally used to get a build. Note that you do need to run
libtoolize to get some missing support scripts installed, and that you need to also run automake in "add missing" mode to get let more support scripts. Then and only then do you get a build.
wget http://downloads.sourceforge.net/project/cunit/CUnit/2.1-2/CUnit-2.1-2-src.tar.bz2 tar xvfj CUnit-2.1-2.tar.bz2 cd CUnit-2.1-2 libtoolize -f -c -i \ && aclocal \ && autoconf \ && automake --gnu --add-missing \ && ./configure --prefix=/usr/local \ && make \ && make install
bootstrap file does provide the required autotools flags.
Most users of PostGIS are safely ensconsed in the world of Linux, and their build/deploy environments are pretty similar to the ones used by the developers, so any problems they might experience are quickly found and removed early in development.
Some users are on Windows, but they are our most numerous user base, so we at least test that platform preemptively before release and make sure it is as good as we can make it.
And then there's the rest. We've had a passel of FreeBSD bugs lately, and I've found myself doing Solaris builds for customers, and don't get me started on the poor buggers running AIX. One of the annoyances of trying to fix a problem for a "weird platform" user is just getting the platform setup and running in the first place.
So, having recently learned a bit about vagrant, and seeing that some of the "weird" platforms have boxes already, I thought I would whip off a couple vagrant configurations so it's easy in the future to throw up a Solaris or FreeBSD box, or even a quick Centos box for debugging purposes.
I've just been setting up my Solaris Vagrantfile and using my favourite Solaris crutch: the OpenCSW software repository. But as I use it, I'm not just adding the "things I need", I'm implicitly choosing an environment:
libxml2is from OpenCSV
gcc, which is version 4, not version 3
This is convenient for me, but what are the chances that it'll be the environment used by someone on Solaris having problems? They might be compiling against libraries from
/usr/sfw/bin, or using the Solaris
gcc-3 package, or any number of other variants. At the end of the day, when testing on such a Solaris environment, will I be testing against a real situation, or a fantasyland of my own making?
For platforms like Ubuntu (apt) or Red Hat (yum) or FreeBSD (port) where there is One True Way to get software, the difficulties are less, but even then there is no easy way to get a "standard environment", or to quickly replicate the combinations of versions a user might have run into that is causing problems (
libjson is a current source of pain). DLL hell has never really gone away, it has just found new ways to express itself.
(I will send a psychic wedgie to anyone who says "docker", I'm not kidding.)
never play chess with a clever elephant