I gave this talk in December, at the CartoDB 2015 partners conference, at the galactic headquarters in glamorous Bushwick, Brooklyn. A bit of a late posting, but hopefully I can still sneak under the "new year predictions bar".
Thursday, December 17, 2015
Ever feel like people are talking about you behind your back? Usually it's just perfectly normal paranoia. But sometimes, they actually are. Maybe.
Backgrounder for those from abroad: Our provincial government was recently caught destroying public records by an Officer of the Legislature, who produced a detailed report with a dozen recommendations on how to stop breaking the law so much. But rather than simply implementing the recommendations, the Premier instead appointed her own smart important guy, David Loukidelis, to go over those recommendations and produce yet another set of This Time It's For Real recommendations for her to take Very, Very Seriously. Mr. Loukidelis produced his report on Wednesday, and the government said it would "accept them all" (for certain definitions of the words "all" and "accept").
"Nonetheless, some observers have suggested in the wake of the investigation report that all emails should be kept."
As far as I know, I've been the only "observer" to suggest that government emails should be archived and retained more-or-less in their entirety, as we expect Canadian financial institutions to do, and as the US government expects all public corporations to do. So I took this as a little bit of a throw down.
David Loukidelis wants to get it on! Is it on? Oh yes, it's on, baby!
(This would be a good moment to go do something a lot more engaging, like picking lint out of your toes, or feeling that sensitive place at the back of your second left molar. I'm about to take apart Recommendation #2 of a 70 page report that, despite costing $50,000, is about as interesting as the last 70 pages of the phone book.)
Chapter 1: It's too big!
After calling out us "observers", Loukedlis then procedes to lay out his Luddite credentials in full, first by calculating the number of pages represented by the 43 terabytes of annual government emails:
"Using the above averages of emails received and sent, each year there would be roughly 426,000,000 pages of received emails and some 129,000,000 pages of sent emails, for a total of roughly 555,000,000 pages of emails. No one would suggest that all emails should be printed, but this gives a sense of the order-of-magnitude implications of the suggestions that, contrary to prudent information management principles, all emails should be kept, or should be vetted by others for retention. The same would be true even if these estimates were reduced by one or even two orders of magnitude, to 55,000,000 pages or 5,500,000 pages."
Staggering! Shocking! Half a billion! I'm surprised he didn't express it in terms of football fields to help the folks at home grasp the staggering immensity. (Because you need to know: 500M pages stack to about 700 football fields high.)
Let's recast this problem in more computer-centric terms:
- The government produces/receives 43TB of email per year.
- A 4TB hard-drive can be purchased for between $200 and $400.
- So depending on the amount of redundency you want, and the quality of hard-drive you purchase, it's possible to store the entire years worth of government email data on between $8,600 and $50,000 worth of hardware. Or, to put it in terms Mr. Loukidelis might understand, for about the cost of one overly wordy report.
Now I'm not suggesting the OCIO buy a dozen 4TB drives and stick a server in the closet, but the numbers above should reassure us that storing 44TB of email per year is not exactly at the far reaches of today's computing capabilities. There are companies that provide cloud-based email archiving services, particularly for organizations with privacy issues and sensitive data (financial companies). In fact, one of the leaders in the field is headquartered right here in BC. I asked them if they could handle the government's data volume.
@pwramsey Yes, we are capable of that. If you have any additional questions, feel free to reach out to email@example.com!— Global Relay (@globalRelay) December 16, 2015
So, we have the technology, we just lack the will.
Chapter 2: It's not searchable!
Unfortunately, Mr. Loukedelis doesn't stop trying to explain technology to the unwashed with his "pages of paper" analogy. He's got yet more reasoning by analogy to share.
"At all costs, the provincial government should not entertain any notion that all electronic records must, regardless of their value, be retained. ... To suggest, as some have, that all information should be kept is akin to suggesting it is good household management for homeowners to never throw away rotten food, grocery lists, old newspapers, broken toys or worn-out clothes. No one keeps their garbage. Hoarding is not healthy."
Except of course, we aren't talking about rotten food, grocery lists, old newspapers, and broken toys here. We're talking about digital data, which can be sifted, filtered and analyzed in microseconds, without human effort of any kind. These are not differences in degree, these are differences in kind.
Mr. Loukedelis might be too young to remember this, but when Google introduced GMail in 2004, they did two remarkable things: they gave every user an unprecedented 1GB of free storage (that number is now 15GB); and, they hid the "delete" button in favor of an "archive" button. The archive button does not delete mails, it just removes them from the Inbox. Google served notice a decade ago: you don't have to delete your mail, and you shouldn't bother to delete your mail, because it's too valuable as a record, and so very easy to search and find what you want.
I'm surprised Mr. Loukidelis, as a lawyer, isn't following the progress of e-discovery technology, rapidly moving from keyword based searching to applying natural language and AI (well, statistical pattern recognition) tools to finding relevant documents in huge corpuses of electronic data.
Suffice it to say, it's early days. Present technology is more than satisfactory to do a much better job than the poor old FOI clerks are doing searching mail boxes. And in the future, we can expect AI tools to easily sort through as much "garbage" as we care to throw at them.
The time to start archiving everything, and letting the computers sort out the mess, is now.
Chapter 3: It's not relevant!
There's one more vignette Mr. Loukedelis shares, a folksy thing, which is also worth looking at:
"This is true even if an individual engages in a transaction that generates records. Take the example of an individual who shops at an online store and arranges to pick up the television they buy at a bricks-and-mortar location. The order confirmation is emailed to them and they print it for pickup purposes. They cannot pick the television up within the allotted window, so they email the retailer to extend the time. The retailer responds. They then email the retailer about whether the television comes with an HDMI cable. The retailer responds. Once the television is picked up, the purchaser keeps the receipt for warranty purposes. This is surely the only documentation that truly matters. It would make no sense to keep all of the emails back and forth, or the printed pickup notice."
Valueless! Cluttering up the important documentary record of government! If we had to store all this back-and-forth nonsense, we'd never be able to find the "good stuff" amongst the trash. Right?
What if the individual were picked up for a murder he didn't commit, and his only alibi was that he sent an email from his desk to the television store, right when the act was committed? What if, after delivery, the individual opens the box and finds no HDMI cable! The store insists there isn't supposed to be one. How can the individual prove otherwise? On and on it goes.
The most trivial pieces of information can have value, in the right circumstances. And since they cost practically nothing to store, why not keep them, particularly in light of the alternative Mr. Loukedelis proposes.
Chapter 4: What's the alternative?
It's important to weigh Mr. Loukidelis' strong rejection of email archiving against the alternative, which is basically the current system.
- Most policy discussion and decisions are handled in email.
- That email may be discarded very easily by any staff member.
- Only if printed and filed will a permanent record be kept.
- If deleted, a copy in the trash folder may find its way to a backup file.
- Once deleted, FOI searches for the record will start to come up empty, as individual searches on staff computers don't necessarily hit the trash folder.
- Also, FOI searches can only find the record if run on the right staff member's computer (unlike with a government-wide archive).
- The copy in the backups will only be retrievable if HP Advanced Solutions restores the backup file (and if you think storing 44TB of data a year is expensive, compare it to having HPAS do really anything at all for you).
- The backups themselves will be purged after 13 months. At that point, the record is gone, forever.
On top of this system, Mr. Loukidelis proposes some sensible tweaks and improvements, but let's be crystal clear: the current system sucks, it's really collosally bad, and there's no excuse for that in 2015.
Mr. Loukidelis should have proposed a real improvement, but instead he wiffed, and he wiffed hard.
Appendix A: Optional Conspiracy Theory Section
Mr. Loukidelis' recommendation #2 is really striking, here it is:
"It is recommended in the strongest possible terms that government resist any notion that all emails should be kept"
Emphasis mine. Not just recommended, but "in the strongest possible terms". None of the other recommendations is remotely so strong. And here's an odd thing: the other recommendations are all addressed to Commissioner Denham's original report, but Denham has nothing at all to say about archiving all email. It's like this particular topic dropped into the Loukidelis report from out of the blue sky, and was greated by a phalanx of flame-throwers.
Why? What's going on? Why spend so much ink, and such strong language, killing an idea that Denham didn't even raise?
I find it hard to believe that Loukidelis really cared that much about "observers" like me and my blog. But he cared enough to not only put in a section about email archiving, but also to beat the topic to death with a shovel.
I think there must have been some internal debate in government about permanently ending the controversy over bad email management by adopting an email archive. And Loukidelis was instructed by political staff on one side of that debate to ensure that the idea was terminated with dispatch.
Maybe Finance Minister Mike "Mr Transparency" de Jong made an email archive a personal hobby-horse and started talking it up in cabinet. If so, having the Loukidelis report kill the idea dead would be a quick and dirty way for the Premier to make sure the discussion went no further.
Regardless, I think there's probably an interesting story behind recommendation #2, and I hope someday I get to hear what it was.
Saturday, November 28, 2015
I attended PgConf Silicon Valley a couple weeks ago and gave a new talk about aspects of PostGIS that come as a surprise to new users. Folks in Silicon Valley arrive at PostGIS with lots of technical chops, but often little experience with geospatial concepts, which can lead to fun misunderstandings. Also, PostGIS just has a lot of historical behaviours we've kept in place for backwards compatibility over the years.
Thanks to everyone who turned out to attend!
Tuesday, October 27, 2015
I caught Keith Baldrey on the aether-box today (CKNW) and he was being generous in his distribution of benefit of the doubt to the poor, poor government staffers trying to handle their email:
“I’ve talked to government staffers about this, and they are confused on what the rules are, it’s very unclear and unevenly applied over what should be deleted and what should not be.”
— Keith Baldrey, Tuesday, October 27, 15:24 on CKNW
Before we get to remedies, let's review what these poor confused dears are doing. For whatever reason, because they believe the email is not an important record, or a duplicate, or they just can't bear to burden the taxpayers of BC with storing a further 85KB of data, the beleaguered staffers are doing the following:
- They select the email in question and hit Delete.
- Then they go to their Trash folder and select the option to purge that folder.
- Finally they open up a special folder called Recover Deleted, and select the option to purge that folder.
Let's be clear. If the poor confused staffers were just plain vanilla innocently deleting emails that they thought were transitory but were not, they would be stopping at step number one. But they aren't. So there's a very particular intent in play here, and that's to make sure that nobody ever sees what's in these emails ever, ever, ever again. And that intent is not consistent with the (current) cover story about innocently not understanding the rules in play with respect to email management.
Moving on to remedies.
We don't need to train them more (or maybe we do, but not for this). We need to establish a corporate email archive that simply takes a copy of every email, sent and received and dumps it into a searchable vault. This is widely available technology, used by public companies and investment dealers around the world.
Once the archive is in place, staffers can manage their email any way they like. They can keep a pristine, empty mail box, the way Minister Todd Stone apparently likes to operate. Or they can keep a complete record of all their email, ready to search and aid their work. Or some happy mixture of the two. They'll be more effective public servants, and the public won't need to worry about records going down the memory hole any more.
Let's get it done, OK?
Saturday, October 24, 2015
... I'm going to tear my ears off. Also "transitory email". Just bam, going to rip them right off.
Note for those not following the British Columbia political news: While we have known for many years that high-level government staff routinely delete their work email, a smoking gun came to light in the spring. A former staffer told how his superior personally deleted emails that were subject to an FOI request and then memorably said "It's done. Now you don't have to worry anymore." (A line which really should only be delivered over a fresh mound of dirt with a shovel in hand.) The BC FOI Commissioner investigated his allegation and reported back that, yep, it really did happen and that the government basically does it all the time.
The Microsoft Outlook tricks and the contortions of policy around what is "transitory" or not, are all beside the point, since:
- there is no reason electronic document destruction should be allowed, in any circumstance, ever, because
- electronic message archival and retrieval is a solved problem.
The BC Freedom of Information Act, with its careful parsing of "transitory" versus real e-mails, was written in the early 1990s, when there was a tangible, physical cost to retaining duplicative and short-lived records -- they took up space, and cost money to store.
Oh, yes, digital documents cost money to store, but please note, my old CD collection (already a very information dense media) takes up a 2-cube box in my garage, but barely dents the storage capacity of an $10 memory stick in MP3 form. My book collection (6 shelves) hardly even registers in digital form. You use more data streaming an episode of Breaking Bad. Things have changed since 1995. And since 2005.
So why are we still having this conversation, and why does the government have such lax rules around message retention? And let me be clear, the government rules are very, very, lax.
"Whoever knowingly alters, destroys, mutilates, conceals, covers up, falsifies, or makes a false entry in any record, document, or tangible object with the intent to impede, obstruct, or influence the investigation or proper administration of any matter within the jurisdiction of any department or agency of the United States or any case filed under title 11, or in relation to or contemplation of any such matter or case, shall be fined under this title, imprisoned not more than 20 years, or both."
Similarly, in Canada investment companies must keep complete archives of all messages, in all kinds of media:
Pursuant to National Instrument 31-103 ... firms must retain records of their business activities, financial affairs, client transactions and communication. ... The type of device used to transmit the communication or whether it is a firm issued or personal device is irrelevant. Dealer Members must therefore design systems and programs with compliant record retention and retrieval functionalities for those methods of communication permitted at the firm. For instance, the content posted on social media websites, such as Twitter, Facebook, blogs, chat rooms and all material transmitted through emails, are subject to the above-noted legislative and regulatory requirements.
— IIROC Guidelines for the review, supervision and retention of advertisements, sales literature and correspondence, Section II
Wow! That sounds really hard! I wonder how US public companies and Canadian investment dealers can do this, while the government can't even upgrade their email servers without losing 8 months worth of archival data:
As it turned out, the entire migration process would take eight months. When the process extended beyond June 2014, MTICS forgot to instruct HPAS to do backups on a monthly basis. This meant that every government mailbox that migrated onto the new system went without a monthly backup until all mailboxes were migrated. Any daily backup that existed was expunged after 31 days. At its peak, some 48,000 government mailboxes were without monthly email backups.
— OIPC Investigation Report F15-03, Page 32
Corporations and investment banks can do this because high volume enterprise email archiving has been a solved problem for well over a decade. So there are lots of options, proprietary, open source, and even British Columbian!
Yep, one of the top companies in the electronic message archiving space, Global Relay, is actually headquartered in Vancouver! Guys! Wake up! Put a salesperson on the float-plane to Victoria on Monday!
Right now, British Columbia doesn't have an enterprise email archive. It has an email server farm, with infrequent backup files, retained for only 18 months and requiring substantial effort to restore and search. Some of the advantages of an archive are:
- The archive is separate from the users, they do not individually determine the retention schedule using their [DELETE] key, retention is applied enterprise-wide on the archive.
- Archive searches are not done by users, they are done by the people who need access to the archive. In the case of corporate archives, that's usually the legal team. In the case of the government it would be the legal team and the FOI officers.
- Archive searches can address the whole collection of email in one search. Current government FOI email searches are done computer-by-computer, by line staff who probably have better things to do.
- The archive is separate from the operational mail delivery and mail box servers, so upgrades on the operation equipment do not affect the archive.
So, for the next little while, the Commissioner's narrow technical recommendations are fine (even though they make me want to tear my ears off):
But the real long-term technical solution to treating email as a document of record is... start treating it as a document of record! Archive it, permanently, in a searchable form, and don't let the end users set the retention policy. It's not rocket science, it's just computers.
Thursday, October 15, 2015
On my usual bi-annual schedule, I gave a keynote talk at FOSS4G this year in Seoul, about the parallel pressures on open source that the move to cloud computing is providing. On the one hand, the cloud runs on open source. On the other hand, below the API layer the cloud is pretty much the opposite of open: it's as much a black box as the old Win32 API. And the growth of cloud is paralleled by the shrinkage of infrastructure maintainers in other venues; the kinds of folks who currently use and produce OSS. It's a big change coming down the highway.
Friday, October 09, 2015
"Sometimes I have the impression that many people in the media consider it uncouth to acknowledge, even to themselves, the fraudulence of much political posturing. The done thing, it seems, is to pretend that we’re having real debates about national security or economics even when it’s both obvious and easy to show that nothing of the kind is actually taking place."
— Paul Krugman
Monday, August 10, 2015
Get off my lawn!
I don't talk about this much, but I actually trained in statistics, not in computer science, and I've been getting slowly but progressively weirded out by the whole "big data" / "data science" thing. Because so much of it is bogus, or boys-with-toys or something.
So that's point number one: most people blabbing on about big data can fit their problem onto a big vertical machine and analyze it to their heart's content in R or something.
Point number two is less frequently touched upon: sure, you have 2 trillion records, but why do you need to look at all of them? The whole point of an education in statistics is to learn how to reason about a population using a random sample. So why are all these alleged "data scientists" firing up massive compute clusters to summarize every single record in their collections?
I'm guessing it's the usual reason: because they can. And because the current meme is that they should. They should stand up a 100 node cluster on AWS and bloody well count all 2 trillion of them. Because: CPUs.
But honestly, if you want to know the age distribution of people buying red socks, draw a sample of a couple hundred thousand records, and find out to within a fraction of a percentage point 19-times-out-of-20. After all, you're a freaking "data scientist", right?
Wednesday, July 15, 2015
If what goes up must come down, nobody told BC's IT outsourcers, because they continue to gobble up a larger chunk of the government pie every year.
The BC Public Accounts came out today, and I'm happy to say that the People Who Are Smarter Than You Are managed to book another record year of billings: a $468,549,154 spend, up 8% over last year.
It's not a victory unless you beat someone else, so good news:
- Overall government revenue, up 5.4%
- Overall government spending, up 2.4%
- Health spending, up 2.8%
- Education spending, up 0%
- IT services spending up 8%!!!!
Don't be sad, kids and sick people, IT services folks are Adding Value and Finding Synergies in ways that you just can't. In the long run, workshopping the new Management Strategy Realignment Plan is just a better investment than fixing your gimpy hip, or hiring a teaching assistant to help Angry Jimmy focus on his work.
HP Advanced Solutions continues to dominate the category, adding $20M in billings this year alone (How many teachers could that hire? At least 200. Or even more teaching assistants.) In fact, two thirds of the billing growth this year was just HP.
There's also a new kid in the enterprise software vendor list to keep an eye on: Salesforce.com (SFDC) showed up with a wee $463,053 in billings this year. I expect that to increase mightily in coming years. However, the big money in SFDC work will not be earned by SFDC (even after locking up the entire BC government enterprise back-office, Oracle bills less than $10M a year in software maintenance), but by the consultants providing SFDC "implementation services" (Deloitte, CGI, HP). Watch for a SFDC goldrush as the government starts replacing expensive Oracle systems with... expensive SFDC systems in the cloud.
The best part about hiring big public companies enterprise IT like HP, Oracle, Maximus, and CGI to create lots of important Technology Process (and occasionally a bit of Product) for us isn't the soothingly glacial pace of progress or the fantastic billing rates. It's knowing that at least 20% of every public dollar spent goes straight to the bottom line of those companies, ensuring that shareholders and institutional investors survive through another year without undue financial hardship.
Until next year, keep on spending, British Columbia!
Monday, April 27, 2015
The BC Liberal government is changing the Elections Act to allow unlimited party and candidate spending within one month of election day and meanwhile, as usual, the media are transfixed by the shiny object in the corner.
The political pundits are making a great deal of noise (see V. Palmer's inside baseball assessment if you care) about an amendment to the Elections Act that says that:
"the chief electoral officer must provide … to a registered political party, in respect of a general election … a list of voters that indicates which voters on the list voted in the general election"
At the same time, they are ignoring the BC Liberals fundamentally changing the money dynamic of the fixed election date by eliminating the 60-day "pre-campaign" period.
"Section 198 is amended (a) by repealing subsections (1) and (2) and substituting the following: (1) In respect of a general election, the total value of election expenses incurred by a registered political party during the campaign period must not exceed $4.4 million."
The Elections Act currently divides up the election period before a fixed election into two "halves": the 60 days before the official campaign, and the campaign period itself (about 28 days if I recall correctly). In the first 60 days, candidates can spend a maximum of $70,000 and parties a maximum of $1.1 million. In the campaign period, candidates can spend another $70,000 and parties as much as $4.4 million.
The intent of the "pre-campaign" period is clearly to focus campaigning on the campaign period itself, by limiting the amount of early spending by parties. The "money density" of the pre-campaign period is about $18,000 / day in party spending; in the campaign period, it is almost $160,000 / day.
This is all very public-spirited, and contributes to a nice focussed election period. But (BUT!) the BC Liberals currently have more money than they know what to do with, so it is in their interest to be able to focus all that money as close to the event as possible. And rather than simply raising the pre-campaign spending limit they went one better: they removed it all together. They can spend unlimited amounts of money as close as 28 days before election day, 21 days before the opening of advance polls.
Let me repeat that: they can spend unlimited amounts of money.
So in British Columbia now, it is legal to both raise unlimited amounts of money from corporations, unions and individuals in any amounts at all (and some individuals and corporations have donated to the BC Liberals, individually, over $100,000 a year), and it is legal to spend unlimited amounts of money, right up to within 28 days of the election day.
See any problems with that?