Government email deleting: intent matters

28 Oct 2015

I caught Keith Baldrey on the aether-box today (CKNW) and he was being generous in his distribution of benefit of the doubt to the poor, poor government staffers trying to handle their email:

Keith Baldrey

“I’ve talked to government staffers about this, and they are confused on what the rules are, it’s very unclear and unevenly applied over what should be deleted and what should not be.”
— Keith Baldrey, Tuesday, October 27, 15:24 on CKNW

Before we get to remedies, let’s review what these poor confused dears are doing. For whatever reason, because they believe the email is not an important record, or a duplicate, or they just can’t bear to burden the taxpayers of BC with storing a further 85KB of data, the beleaguered staffers are doing the following:

They select the email in question and hit Delete.
Then they go to their Trash folder and select the option to purge that folder.
Finally they open up a special folder called Recover Deleted, and select the option to purge that folder.

Let’s be clear. If the poor confused staffers were just plain vanilla innocently deleting emails that they thought were transitory but were not, they would be stopping at step number one. But they aren’t. So there’s a very particular intent in play here, and that’s to make sure that nobody ever sees what’s in these emails ever, ever, ever again. And that intent is not consistent with the (current) cover story about innocently not understanding the rules in play with respect to email management.

Moving on to remedies.

We don’t need to train them more (or maybe we do, but not for this). We need to establish a corporate email archive that simply takes a copy of every email, sent and received and dumps it into a searchable vault. This is widely available technology, used by public companies and investment dealers around the world.

Once the archive is in place, staffers can manage their email any way they like. They can keep a pristine, empty mail box, the way Minister Todd Stone apparently likes to operate. Or they can keep a complete record of all their email, ready to search and aid their work. Or some happy mixture of the two. They’ll be more effective public servants, and the public won’t need to worry about records going down the memory hole any more.

Let’s get it done, OK?

If I hear the words "triple delete" one more time...

24 Oct 2015

… I’m going to tear my ears off. Also “transitory email”. Just bam, going to rip them right off.

Note for those not following the British Columbia political news: While we have known for many years that high-level government staff routinely delete their work email, a smoking gun came to light in the spring. A former staffer told how his superior personally deleted emails that were subject to an FOI request and then memorably said “It’s done. Now you don’t have to worry anymore.” (A line which really should only be delivered over a fresh mound of dirt with a shovel in hand.) The BC FOI Commissioner investigated his allegation and reported back that, yep, it really did happen and that the government basically does it all the time.

The Microsoft Outlook tricks and the contortions of policy around what is “transitory” or not, are all beside the point, since:

there is no reason electronic document destruction should be allowed, in any circumstance, ever, because
electronic message archival and retrieval is a solved problem.

The BC Freedom of Information Act, with its careful parsing of “transitory” versus real e-mails, was written in the early 1990s, when there was a tangible, physical cost to retaining duplicative and short-lived records – they took up space, and cost money to store.

Oh, yes, digital documents cost money to store, but please note, my old CD collection (already a very information dense media) takes up a 2-cube box in my garage, but barely dents the storage capacity of an $10 memory stick in MP3 form. My book collection (6 shelves) hardly even registers in digital form. You use more data streaming an episode of Breaking Bad. Things have changed since 1995. And since 2005.

So why are we still having this conversation, and why does the government have such lax rules around message retention? And let me be clear, the government rules are very, very, lax.

In the USA, public companies are under the Sarbanes-Oxley rules and have extremely strict requirements for document retention, with punishments to match:

“Whoever knowingly alters, destroys, mutilates, conceals, covers up, falsifies, or makes a false entry in any record, document, or tangible object with the intent to impede, obstruct, or influence the investigation or proper administration of any matter within the jurisdiction of any department or agency of the United States or any case filed under title 11, or in relation to or contemplation of any such matter or case, shall be fined under this title, imprisoned not more than 20 years, or both.”

Similarly, in Canada investment companies must keep complete archives of all messages, in all kinds of media:

Pursuant to National Instrument 31-103 … firms must retain records of their business activities, financial affairs, client transactions and communication. … The type of device used to transmit the communication or whether it is a firm issued or personal device is irrelevant. Dealer Members must therefore design systems and programs with compliant record retention and retrieval functionalities for those methods of communication permitted at the firm. For instance, the content posted on social media websites, such as Twitter, Facebook, blogs, chat rooms and all material transmitted through emails, are subject to the above-noted legislative and regulatory requirements.
— IIROC Guidelines for the review, supervision and retention of advertisements, sales literature and correspondence, Section II

Wow! That sounds really hard! I wonder how US public companies and Canadian investment dealers can do this, while the government can’t even upgrade their email servers without losing 8 months worth of archival data:

As it turned out, the entire migration process would take eight months. When the process extended beyond June 2014, MTICS forgot to instruct HPAS to do backups on a monthly basis. This meant that every government mailbox that migrated onto the new system went without a monthly backup until all mailboxes were migrated. Any daily backup that existed was expunged after 31 days. At its peak, some 48,000 government mailboxes were without monthly email backups.
— OIPC Investigation Report F15-03, Page 32

Corporations and investment banks can do this because high volume enterprise email archiving has been a solved problem for well over a decade. So there are lots of options, proprietary, open source, and even British Columbian!

Yep, one of the top companies in the electronic message archiving space, Global Relay, is actually headquartered in Vancouver! Guys! Wake up! Put a salesperson on the float-plane to Victoria on Monday!

Right now, British Columbia doesn’t have an enterprise email archive. It has an email server farm, with infrequent backup files, retained for only 18 months and requiring substantial effort to restore and search. Some of the advantages of an archive are:

The archive is separate from the users, they do not individually determine the retention schedule using their [DELETE] key, retention is applied enterprise-wide on the archive.
Archive searches are not done by users, they are done by the people who need access to the archive. In the case of corporate archives, that’s usually the legal team. In the case of the government it would be the legal team and the FOI officers.
Archive searches can address the whole collection of email in one search. Current government FOI email searches are done computer-by-computer, by line staff who probably have better things to do.
The archive is separate from the operational mail delivery and mail box servers, so upgrades on the operation equipment do not affect the archive.

So, for the next little while, the Commissioner’s narrow technical recommendations are fine (even though they make me want to tear my ears off):

But the real long-term technical solution to treating email as a document of record is… start treating it as a document of record! Archive it, permanently, in a searchable form, and don’t let the end users set the retention policy. It’s not rocket science, it’s just computers.

Keynote at FOSS4G 2015

15 Oct 2015

On my usual bi-annual schedule, I gave a keynote talk at FOSS4G this year in Seoul, about the parallel pressures on open source that the move to cloud computing is providing. On the one hand, the cloud runs on open source. On the other hand, below the API layer the cloud is pretty much the opposite of open: it’s as much a black box as the old Win32 API. And the growth of cloud is paralleled by the shrinkage of infrastructure maintainers in other venues; the kinds of folks who currently use and produce OSS. It’s a big change coming down the highway.

Krugman FTW

09 Oct 2015

“Sometimes I have the impression that many people in the media consider it uncouth to acknowledge, even to themselves, the fraudulence of much political posturing. The done thing, it seems, is to pretend that we’re having real debates about national security or economics even when it’s both obvious and easy to show that nothing of the kind is actually taking place.”
— Paul Krugman

Big Data and Data Science Piss Me Off

11 Aug 2015

Get off my lawn!

.@galvanize bringing its 12-week Data Science boot camp to Denver. You still stoked about studying for the GRE ? pic.twitter.com/EMFgUq0OFV
— Brian Timoney (@briantimoney) August 11, 2015

I don’t talk about this much, but I actually trained in statistics, not in computer science, and I’ve been getting slowly but progressively weirded out by the whole “big data” / “data science” thing. Because so much of it is bogus, or boys-with-toys or something.

Basically, my objections to the big data thing are the usual: probably your data is not big. It really isn’t, and there are some great blog posts all about that.

So that’s point number one: most people blabbing on about big data can fit their problem onto a big vertical machine and analyze it to their heart’s content in R or something.

Point number two is less frequently touched upon: sure, you have 2 trillion records, but why do you need to look at all of them? The whole point of an education in statistics is to learn how to reason about a population using a random sample. So why are all these alleged “data scientists” firing up massive compute clusters to summarize every single record in their collections?

I’m guessing it’s the usual reason: because they can. And because the current meme is that they should. They should stand up a 100 node cluster on AWS and bloody well count all 2 trillion of them. Because: CPUs.

But honestly, if you want to know the age distribution of people buying red socks, draw a sample of a couple hundred thousand records, and find out to within a fraction of a percentage point 19-times-out-of-20. After all, you’re a freaking “data scientist”, right?

Older Newer

Paul Ramsey

Government email deleting: intent matters

If I hear the words "triple delete" one more time...

Keynote at FOSS4G 2015

Krugman FTW

Big Data and Data Science Piss Me Off