Krugman FTW

“Sometimes I have the impression that many people in the media consider it uncouth to acknowledge, even to themselves, the fraudulence of much political posturing. The done thing, it seems, is to pretend that we’re having real debates about national security or economics even when it’s both obvious and easy to show that nothing of the kind is actually taking place.”
Paul Krugman

Big Data and Data Science Piss Me Off

Get off my lawn!

I don’t talk about this much, but I actually trained in statistics, not in computer science, and I’ve been getting slowly but progressively weirded out by the whole “big data” / “data science” thing. Because so much of it is bogus, or boys-with-toys or something.

Basically, my objections to the big data thing are the usual: probably your data is not big. It really isn’t, and there are some great blog posts all about that.

So that’s point number one: most people blabbing on about big data can fit their problem onto a big vertical machine and analyze it to their heart’s content in R or something.

Point number two is less frequently touched upon: sure, you have 2 trillion records, but why do you need to look at all of them? The whole point of an education in statistics is to learn how to reason about a population using a random sample. So why are all these alleged “data scientists” firing up massive compute clusters to summarize every single record in their collections?

I’m guessing it’s the usual reason: because they can. And because the current meme is that they should. They should stand up a 100 node cluster on AWS and bloody well count all 2 trillion of them. Because: CPUs.

But honestly, if you want to know the age distribution of people buying red socks, draw a sample of a couple hundred thousand records, and find out to within a fraction of a percentage point 19-times-out-of-20. After all, you’re a freaking “data scientist”, right?

BC IT Outsourcing 2014/15

If what goes up must come down, nobody told BC’s IT outsourcers, because they continue to gobble up a larger chunk of the government pie every year.

The BC Public Accounts came out today, and I’m happy to say that the People Who Are Smarter Than You Are managed to book another record year of billings: a $468,549,154 spend, up 8% over last year.

It’s not a victory unless you beat someone else, so good news:

  • Overall government revenue, up 5.4%
  • Overall government spending, up 2.4%
  • Health spending, up 2.8%
  • Education spending, up 0%
  • IT services spending up 8%!!!!

Don’t be sad, kids and sick people, IT services folks are Adding Value and Finding Synergies in ways that you just can’t. In the long run, workshopping the new Management Strategy Realignment Plan is just a better investment than fixing your gimpy hip, or hiring a teaching assistant to help Angry Jimmy focus on his work.

HP Advanced Solutions continues to dominate the category, adding $20M in billings this year alone (How many teachers could that hire? At least 200. Or even more teaching assistants.) In fact, two thirds of the billing growth this year was just HP.

There’s also a new kid in the enterprise software vendor list to keep an eye on: Salesforce.com (SFDC) showed up with a wee $463,053 in billings this year. I expect that to increase mightily in coming years. However, the big money in SFDC work will not be earned by SFDC (even after locking up the entire BC government enterprise back-office, Oracle bills less than $10M a year in software maintenance), but by the consultants providing SFDC “implementation services” (Deloitte, CGI, HP). Watch for a SFDC goldrush as the government starts replacing expensive Oracle systems with… expensive SFDC systems in the cloud.

The best part about hiring big enterprise IT companies like HP, Oracle, Maximus, and CGI to create lots of important Technology Process (and occasionally a bit of Product) for us isn’t the soothingly glacial pace of progress or the fantastic billing rates. It’s knowing that at least 20% of every public dollar spent goes straight to the bottom line of those companies, ensuring that shareholders and institutional investors survive through another year without undue financial hardship.

Until next year, keep on spending, British Columbia!

More Speech for Money

The BC Liberal government is changing the Elections Act to allow unlimited party and candidate spending within one month of election day and meanwhile, as usual, the media are transfixed by the shiny object in the corner.

The political pundits are making a great deal of noise (see V. Palmer’s inside baseball assessment if you care) about an amendment to the Elections Act that says that:

“the chief electoral officer must provide … to a registered political party, in respect of a general election … a list of voters that indicates which voters on the list voted in the general election”

At the same time, they are ignoring the BC Liberals fundamentally changing the money dynamic of the fixed election date by eliminating the 60-day “pre-campaign” period.

“Section 198 is amended (a) by repealing subsections (1) and (2) and substituting the following: (1) In respect of a general election, the total value of election expenses incurred by a registered political party during the campaign period must not exceed $4.4 million.”

The Elections Act currently divides up the election period before a fixed election into two “halves”: the 60 days before the official campaign, and the campaign period itself (about 28 days if I recall correctly). In the first 60 days, candidates can spend a maximum of $70,000 and parties a maximum of $1.1 million. In the campaign period, candidates can spend another $70,000 and parties as much as $4.4 million.

The intent of the “pre-campaign” period is clearly to focus campaigning on the campaign period itself, by limiting the amount of early spending by parties. The “money density” of the pre-campaign period is about $18,000 / day in party spending; in the campaign period, it is almost $160,000 / day.

This is all very public-spirited, and contributes to a nice focussed election period. But (BUT!) the BC Liberals currently have more money than they know what to do with, so it is in their interest to be able to focus all that money as close to the event as possible. And rather than simply raising the pre-campaign spending limit they went one better: they removed it all together. They can spend unlimited amounts of money as close as 28 days before election day, 21 days before the opening of advance polls.

Let me repeat that: they can spend unlimited amounts of money.

So in British Columbia now, it is legal to both raise unlimited amounts of money from corporations, unions and individuals in any amounts at all (and some individuals and corporations have donated to the BC Liberals, individually, over $100,000 a year), and it is legal to spend unlimited amounts of money, right up to within 28 days of the election day.

See any problems with that?

GIS "Data Models"

Most IT professionals have some expectation, having received a basic education on relational data modelling, that a model for a medium sized problem might look like this:

Why is it, then, that production GIS data flows so consistently produce models that look like this:

What is wrong with us?!?? I bring up this rant only because I was just told that some users find the PostgreSQL 1600 column limit constraining since it makes it hard to import the Esri census data, which are “modelled” into tables that are presumably wider than they are long.