BC IT Outsourcing 2021/22

23 Oct 2022

Another year, another IT public accounts summary. I’m finding it hard to tease a story out of this one.

On the one hand, the total has begun going up again, though not so far as to notch a new record.

On the other hand, for the first time in 10 years, the amount of money taken in by ESIT (formerly HP Advanced Systems (formerly EDS)) has plummeted.

Not only has has ESIT revenue gone into free-fall, the beneficiaries appear to be… Maximus and Deloitte? So not so much a reduction in outsourcing to international consultancies, as a reallocation.

Meanwhile, dollars to local firms seem to have flattened out.

What’s the story here? I have no idea. The reduction in dollars to ESIT seems like it might be reflective of an insourcing. The simultaneous increase in dollars to Deloitte and Maximus? I have no idea what might have driven that. In particular Maximus, which should have been swimming upstream after MSP premiums were eliminated.

I will make an effort to bring in the Health Authorities this year and get caught up on that piece of the puzzle, there may be some interesting trends there.

What you Need to Know

17 Oct 2022

Matt Asay’s piece today about the skills shortage in cloud got me to thinking about what constitutes the required knowledge to work on various projects, and how much implicit knowledge is bound up with any given “single” technology.

The cloud has a people problem (tl;dr? There's a skills shortage, per @cloudpundit, and multicloud, increasingly a reality, compounds the problem) https://t.co/7fwFEYRC3K by @mjasay for @InfoWorld
— Matt Asay (@mjasay) October 17, 2022

Like, what should you know to “do cloud”?

Doing Cloud

Well, “containers” I guess would be a core building block, but what does that imply? Some knowledge of Linux system administration, so you can understand things like the shell and environment. Lots of network administration as well, to understand routing rules and name resolution. Perhaps some package management so you can install some dependencies and build out others. Really, sounds like you need to understand all the things a late 90’s era sysop did. And we haven’t even started in on the actual cloud part yet!

For that you need to understand the abstraction over the actual machines that the container management environment (pick one, pick ten) provides, the configuration system for that environment (declarative? programmatic?), and whatever security/authentication/roles system your cloud uses. Multiply by N for “multi-cloud”.

And that’s just for the container part! Want to get into objects stores (S3/R2) and production deployments? Add in a pretty solid understanding of HTTP, CDN options, DNS and TLS.

Do you have to manage queues? Databases? ML? Each cloud has its own subtly different and variegated offering, and each area has its own unique set of skills you need to master, whether it be DBA concepts or domain languages or security needs.

I can rattle off all these things because I have learned them all to some extent or other over the course of my career, and that’s great but I’m 51 years old. I am constantly amazed that any new graduate can find enough initial purchase to start a career in technology.

Doing PostGIS

Like, want to get involved in PostGIS? Easy, “it’s in C”! Except, well actually we’ve got some C++ bits, but frankly if you know C you probably know C++ too. C and C++ both embed a macro language (CPP) so you need to know that to understand the code.

But wait, you want to build it? Well the build system is autotools. That’s an unholy mixture of M4 and bash shell scripting. Fortunately you can mostly ignore the M4 but bash skills are definitely required.

Naturally you’ll also have to understand Makefile syntax.

Of course PostGIS is a PostgreSQL extension, so to even understand what we’re doing you need to understand SQL. You’ll end up learning the PostgreSQL extension and function APIs by osmosis via the PostGIS code base, but it’s still another thing.

Inevitably you will make a mistake, so you’ll need to understand a debugger, probably gdb or lldb, which are glorious command-line tools with their own terse commandline syntax.

Ooops! Forgot the UNIX commandline! That’s table stakes, really, but there you go.

Also forgot git, which is one of the few tools that is common across most branches of practical modern IT.

The documentation is all in DocBook XML, but you can probably figure that out by inspection.

You probably won’t change the WKT lexer/parser, but that’s in flex/bison. There’s also a protobuf binding that has its own little domain language.

Fixes? Nah.

One of the reasons I don’t do “real javascript” is because doing this exercise with any modern Javascript project involves a similar-but-much-longer litany of completely different tooling from the tooling I already know.

I learned Go relatively recently, and I have to say that even though I did have to, yes, learn all about build and test and debug and modularity for Go, at least there wasn’t a huge pile of options to wade through – the Go community mostly “picked one” for the core tools. (Not true for database connectivity though! You can choose from among multiple PostgreSQL client libraries!)

I don’t know how to end this except to say it feels like my whole career has been spent watching geological layers of technology pack down on each other, layer upon layer upon layer, and that the result seems, in aggregate, completely unapproachable.

I wait for some kind of simplifying moment.

The repeated theme year after year after year has been that “encapsulation” and “abstraction” will at least allow us to ignore the lowest layers. To some extent that is true – I have made it through a whole career without having to learn assembler, and I probably will never need to.

But so many of the other promises have failed to play out.

Object orientation did not result in a new world of stringing together opaque components. The insides still matter intensely and people still have to understand the parts.

Containers have flattened out the need to worry about dependency chains, but the chains are still there inside the containers, and managing the production deployment of containers is now a career defining field in its own right.

An Expanding Universe

It’s possible that most of this apparent exponential complexity explosion is just due to the ever widening scope of what “working with computers” means.

The number of things you can do with computers is just orders of magnitude larger than when I started working with them. So the number of tools is similarly orders of magnitude larger. Dog bites man.

However, I have had the privilege of seeing most of the layers laid down, so I didn’t have to learn them all at once to become productive. Each piece came along in its time, and added a little to the stack.

So, I stand amazed at newly minted technologists who can start, and get productive, in this intellectual garbage tip we call a “profession”. We couldn’t have built a less enjoyable or consistent collection of tools if we had tried, and yet you persevere and exceed all of us. Hats off to all of you!

Learning PostgreSQL Internals

01 Oct 2022

I had coffee with an IT colleague here in Victoria last week, and he was interested in getting into core PostgreSQL programming. “What resources would you recommend I look at?”

moka house

That’s… a hard question!

PostgreSQL is a huge code base with a multi-decade history. I’ve been poking around the edges for almost 10 years and feel comfortable with the extension APIs, foreign data wrappers, access methods APIs, some system catalogue stuff… maybe 5% of the surface area of the beast?

complex

So, what advice for someone who wants to dive much much deeper than that?

First, start with the vision, and read “The Design of Postgres” (Stonebraker & Rowe, 1985) to get a sense of what distinguished Postgres from its predecessors: complex objects; user extensibility; and active database facilities; all while retaining relational concepts.

Second, take a run through the Bruce Momjain’s “internals” presentations. These tend to be a little older, Bruce hasn’t been doing deep core work for a while, but he’s an expert teacher and explainer, so they are useful to get a feel for the shape of things. In a similar (and more recent) vein, my colleague Stephen Frost walks through the code base in this 2018 talk about adding a new feature to PostgreSQL.

Third, consider spending some time with “The Internals of PostgreSQL”. This is a very detailed look at PostgreSQL subsystems, including header structures and data flow. As with any book, it may have already drifted a bit from the particulars of current PostgreSQL, but there is no other resource I know that even attempts to explain internals at this granularity.

Fourth, the source code itself is an amazing resource, and the commentary in header files and function descriptions is very good. The incredibly detailed and stringent source code review process of the PostgreSQL community not only expects good code, but also good documentation of what the code does. I’m not sure how much this can be traced back to the influence of Tom Lane (whose comments are frequently complete technical manuals) but regardless the culture of good internal documentation is in place, and makes the code as “approachable” as a system of this complexity could hope to be.

Now things get harder.

conferences

Conference talks are a moving target, because they tend to focus on the leading edge of things, but there are some community members who regularly give talks about their work on core topics, that must necessarily explain how things work in order to contextualize things.

Unfortunately, PostgreSQL conferences have a habit of … not making recordings! So there’s relatively little online. I have seen some amazing talks (the multi-session query planner master class Tom Lane gave at PgCon 2011 sticks out) but most of them are lost to the ages.

The best conference to go to for core technical content is undoubtedly PgCon. (I will see you there this spring!)

COVID robbed us of many things, but it did cause PgCon to record and publish a number of standout technical talks that might otherwise have never been available.

Here’s the presenters I always mark down in my program and rush to get a seat for:

Andres Freund, who while hammering out incredibly hard technical work still makes time to explain what he is up to.
Robert Haas, who keeps pushing really important things from his perch as EDB CTO
- Concurrent DDL
- What’s in a Plan
Amit Kapila, who is quietly banging out impressive work every release
- Hash Indexes
Melanie Plageman, who creates jaw droppingly good explanations of really hard topics (the query planner talk blew my mind)
- Intro to Postgres Planner
- work_mem
Peter Geoghegan, who goes right to the foundations and builds things up (unfortunately the btree talk he gave, which was a tour de force, is not online, perhaps this nbtree talk is an acceptable substitute)
- nbtree

I would also go to any talk Tom Lane felt like giving. And Thomas Vondra, and Thomas Munro, and Oleg Bartunov.

hike

Learning the PostgreSQL code base is a journey of a million steps, that’s for sure. One thing that all my effective personal learning has had in common is a practical need. My learning has been in support of some actual work that needed to be done for my employer of the moment. It had motivation, a start point and an end point. That was really helpful.

Best of luck in your PostgreSQL journeys!

Additional Notes

Because of course I left out some stuff in the first draft:

Once you are at the level of perusing source code, you should also be reading the contents of the pgsql-hackers mailing list, and the current commitfest patches. This will help you understand the culture of contribution, and also get a feel for where work is ongoing.
One gateway to contribution is patch review. Starting with basic things like “does it compile against main” and “does it do what the author says”. Read the other reviews, see what other reviewers look for, wash rinse repeat. I still think that having your own small patches put through the process is invaluable learning. Doing some review of other people’s patches is a way of “paying it forward” for reviews your own patches receive.
The amazing Oleg Bartunov wrote in to tell me there is now a second book on PostgreSQL internals, “PostgreSQL 14 Internals. A deep dive into the server mechanics”. Currently in Russian, but an English transition is going on right now.

Technology, Magic & PostgreSQL

23 Jun 2022

I have a blog post up today at Crunchy Data on some of the mechanisms that underlie the PostgreSQL query planner, it’s pretty good if I do say so myself.

I was motivated to write it by a conversation over coffee with my colleague Martin Davis. We were talking about a customer with an odd query plan case and I was explaining how the spatial statistics system worked and he said “you should do that up as a blog post”. And, yeah, I should.

One of the things that is striking as you follow the PostgreSQL development community is the extent to which a fairly mature piece of technology like PostgreSQL is stacks of optimizations on top of optimizations on top of optimizations. Building and executing query plans involves so many different paths of execution, that there’s always a new, niche use case to address and improve.

I worked a political campaign a few years ago as a “data science” staffer, and our main problem was stitching together data from multiple systems to get a holistic view of our data.

That meant doing cross-system joins.

The first cut is always easy: pull a few records out of System A with a filter condition and then go to System B and pull the associated records. But then inevitably a new filter condition shows up and applied to A it generates so many records that the association step on B gets overloaded. But it turns out if I start from B and then associate in A it’s fast again.

And thus suddenly I found myself writing a query planner and executor.

It’s only when dumped into the soup of having to solve these problems yourself that you really appreciate the magic that is a mature relational database system. The idea that PostgreSQL can take a query that involves multiple tables of different sizes, with different join cardinalities, and different indexes and figure out an optimal plan in a few milliseconds, and then execute that plan in a streaming, memory efficient way…?

Magic is really the best word I’ve found.

Some More PostGIS Users

21 Jun 2022

The question of why organizations are shy about their use of open source is an interesting one, and not completely obvious.

Open source luminary Even Roualt asks:

is there some explanation why most institutions can’t communicate about their PostGIS use ? just because it is a major hurdle for technical people to get their public relationship department approve a communication ? people afraid about being billed about unpaid license fees 🤣 ?

There’s really very little upside to publicizing open source use. There’s no open source marketing department to trumpet the brilliance of your decision, or invite you to a conference to give you an award. On the other hand, if you have made the mistake of choosing an open source solution over a well-known proprietary alternative, there is surely a local sales rep who will call your boss to tell them that you have made a big mistake. (You do have a good relationship with your boss, I hope.)

These reverse incentives can get pretty strong. Evendiagram reports:

Our small group inside a large agency uses postgis. We don’t talk about it, even internally, to avoid the C-suite forcing everyone back to oracle. RHEL repos allow us a lot of software that would otherwise be denied.

This reminds me of my years consulting for the British Columbia government, when technical staff would run data processing or even full-on public web sites from PostgreSQL/PostGIS machines under their desktops.

They would tell their management it was “just a test system” or “a caching layer”, really anything other than “it’s a database”, because if they uttered the magic word “database”, the system would be slated for migration into the blessed realm of enterprise Oracle systems, never to be heard from again.

Logos

Meanwhile, Daryl Herzmann reminds us that the Iowa Mesonet has been on Team PostGIS since 2003.

Iowa Environmental Mesonet, Iowa State University

Data being managed in the database
Meteorological Data, “Common” GIS datasets (roads, counties), Current and Archived NWS Tornado/Flash Flood/Thunderstorm Warnings, Historical Storm Reports, Current and Archived precipitation reports. Climate data

How the data is being accessed / manipulated
From mapserver! Manipulated via Python and PHP.

Why you chose to use PostGIS for the application
Open-Source. Uses my favorite DB, Postgres. Easy integration with mapserver. The support community is fantastic!

Further afield, the GIS portals of governments throughout Ukraine are running on software built on PostGIS.

Jørgen Larsen de Martino notes that:

The Danish Agency for Data Supply and Infrastructure uses PostGIS extensively - and have been using it for the last 10 years - we would not have had the success we have was it not for @PostGIS.

The Utah Geospatial Resource Center uses PostGIS to provide access to multiple spatial layers for direct access in a cloud-hosted PostGIS database called the “Open SGID”. (I can hear DBA heads exploding around the world.)

Counterpoint

While self-reporting is nice, sometimes just a little bit of dedicated searching will do. Interested in PostGIS use in the military? Run a search for “postgis site:mil” and see what pops up!

The 108th wing of the Air Force! Staff Sgt. Steve De Leon is hard at it!

“I’m taking all the data sources that AMC and A2 compile and indexing them into the PostgreSQL/PostGIS data and then from there trying to script Python code so the website can recognize all the indexed data in the PostgreSQL/PostGIS database,” said the De Leon.

The Canadian Department of National Defense is building Maritime Situational Awareness Research Infrastructure with a PostgreSQL/PostGIS standard database component.

PostgreSQL with its PostGIS extension is the selected DBMS for MSARI. To ease mainte- nance and access, if more than one database are used, PostgreSQL will be selected for all databases.

The Coast Guards “Environmental Response Management Application (ERMA)” is also running PostGIS.

The application is based on open source software (PostgreSQL/PostGIS, MapServer, and OpenLayers), that meet Open Geospatial Consortium (OGC) specifications and standards used across federal and international geospatial standards communities. This ensures ERMA is compatible with other commercial and open-source GIS applications that can readily incorporate data from online data projects and avoids licensing costs. Open-source compatibility supports data sharing, leverages existing data projects, reduces ERMA’s maintenance costs, and ensures system flexibility as the technology advances. Because ERMA is open source, it can easily be customized to meet specific user requirements.

More logos?

Want to appear in this space? Email me!

Older Newer

Paul Ramsey