Wednesday, July 16, 2014

BC IT Outsourcing 2013/14

"O frabjous day! Callooh! Callay!"

The BC Public Accounts came out today, so it's time to update the statistics and see how the IT consulting racket shaped up in BC last year. Judging from the sharp suits on the streets and general perkiness of the local IT labour market, I'd guess "pretty peachy", but there's something to be said for actually checking the numbers.

Totalling up all the Usual Suspects, I am pleased to report that 2013/14 was another record-breaking year in technology outsourcing: a $435,350,420 spend, that's up 11% over last year! Rockin' it!

Let's put that in perspective, shall we?

  • IT sourcing rocked it with an 11% gain.
  • Overall government spending was up $174M on a budget of $43B, for a gain of 0.4%.
  • The government recently offered the teachers a contract with a 1.1% (average) annual wage lift.
  • Canadian inflation in 2013 was 1.24%.
  • BC education spending increased 2.6% over last year.
  • BC health care spending increased 2.1% over last year.

I said it last year, and I'll say it again this year: suck on that, children and sick people! Who's the boss? IT people are the boss!

Once again, in the individual category, HP Advanced Solutions reigns supreme, billing out $138,407,858, a 6.8% gain. HP's growth is slowing though and my favourite systems integrator, Deloitte, just closed a monster year with 51% year-over-year billings growth and a take of $54,294,507. Look out HP, someone hungry is on your heals!

I recently discovered that there's a significant government IT spend in the health authorities, so I'm looking forward to adding some new stats over the summer. In addition, I feel like leaving Telus out of the accounting is an increasingly hard call: while much of their billing is infrastructure stuff like group cell phone plans and connectivity, they also have a huge new outsourcing arm doing all sorts of not-at-all-like-a-telephone stuff: Telus Health, anyone?

Until next year, keep on spending, British Columbia!

Monday, July 07, 2014

Some Privacy is More Private Than Others

One of the things that struck me in researching the long and tortuous story of how the government is trying to move British Columbian's private data into off-shore cloud computing services was the odd choice of the pilot project for the whole scheme: STADD.

What's STADD? It's "Services to Adults with Developmental Disabilities".

That's right, adults with developmental disabilities are the subjects of the BC government's experiment to see "hmm, I wonder if we can offshore private data using fancy tokenization software".

Let me put some icing on the cake.

The BC Liberal caucus has to manage information about the citizens who access services via their constituency offices. These are their "customers" and they use a "customer relationship management" (CRM) system to hold the information.

Are they storing this personal information offshore? Are they trying to shoehorn it into salesforce.com using tokenization software to avoid FOIPPA restrictions and protect their constituents from the PATRIOT Act?

No, that would be risky, that's the kind of thing that STADD can pilot. The BC Liberal caucus uses a product called "Maximizer CRM". Designed, built and hosted in... Vancouver, British Columbia.

FOSS4G 2014 in Portland, Oregon, September 2014

Just a quick public service announcement for blog followers in the Pacific Northwest and environs: you've got a once in a not-quite-lifetime opportunity to attend the "Free and Open Source Software for Geospatial" (aka FOSS4G) conference this year in nearby Portland, Oregon, a city so hip they have trouble seeing over their pelvis.

Anyone in the GIS / mapping world should take the opportunity to go, to learn about what technology the open source world has available for you, to meet the folks writing the software, and the learn from other folks like you who are building cool things.

September 8th-13th, be there and be square.

Friday, July 04, 2014

Tokenization and Your Private Data (5)

Recapping (last time):

  • (Day 1) The government is interested in using the salesforce.com CRM and other USA cloud applications, but the BC FOIPPA Act does not allow it.
  • (Day 2) So, the BC CIO has recommended "tokenization" systems to make personal information 100% obscured before storage in USA cloud applications.
  • (Day 3) But, using truly secure tokenization renders CRMs basically useless, so software vendors are flogging less secure forms of tokenization hoping that people won't notice the reduced security levels because they still call it "tokenization".
  • (Day 4) And, the BC Freedom of Information & Privacy Commissioner distinguishes between "encryption" (which is considered inadequate protection for personal information held outside Canada) and "tokenization" (which is considered adequate (but only where the "tokenization" itself is "adequate" (which seems to mean "fully random"))).

While this series on tokenization has been a bomb with regular folks (my post on the BCTF and social media got 10x the traffic) one category of readers have really taken notice: tokenization vendors. I've gotten a number of emails, and some educational comments as well. (Hi guys!)

For the love of the vendors, I'll repeat yesterdays postscript. I think I have been overly harsh on the cloud security vendors, because there are really two questions here, which have very different answers:

  • Is less-than-perfect tokenization better than nothing? Yes, it's a lot better than nothing. Even with less-than-perfect tokenization, employees of the cloud software companies can't just casually read records in the database, and an entity wanting to break the security of the records would need to extract a pretty big corpus of records to analyze them to find information leaks and use them to break in.
  • Is less-than-perfect tokenization acceptable for BC? No, because of the FOIPPA law, and because the Commissioner has already set a very very very high bar by not allowing standard symmetric encryption (which can be very very secure) to be used to host personal data outside of Canada.

It's worth re-visiting the two key phrases in the OIPC guidance, which are:

Tokenization is distinct from encryption; while encryption may be deciphered given sufficient computer analysis, tokens cannot be decoded without access to the crosswalk table.

What I take from this is that the OIPC is saying that "encryption" is vulnerable (it "may be deciphered"), and "tokenization" is not (it "cannot by decoded"). Now, as discussed on day 3, the "cannot be decoded" part is only true for a very small sub-set of "tokenization", the kind that uses fully random tokens. And the OIPC is aware of this, though they only barely acknowledge it:

Public bodies may comply with FIPPA provided that the personal information is adequately tokenized and the crosswalk table is secured in Canada.

If you take "adequately" to mean "adequately" such that "tokens cannot be decoded without access to the crosswalk table" then you're talking about an extremely restrictive definition of tokenization. A lot more restrictive than what vendors are talking about when they come to sell you tokenization.

The vendors who are phoning me and commenting here are worried that readers will see my critique and think "huh, tokenization is insecure". And that's not what I'm saying. What I'm saying is:

Practical use of tokenization in a USA cloud CRM is not consistent with the British Columbia OIPC's incredibly narrow definition of an acceptable level of data security for personal information stored in foreign jurisdictions or under foreign control.
Paul Ramsey, Just Now

If you're just looking for a reasonable level of surety that your data in a cloud service cannot be easily poked and prodded by a third party (or the cloud service itself), and you don't mind adding the extra level of complexity of interposing a tokenization service/server into your interactions with the cloud service, then by all means, a properly configured tokenization system would seem to fit the bill nicely.

YMMV.

Thursday, July 03, 2014

Tokenization and Your Private Data (4)

Recapping:

  • (Day 1) The government is interested in using the salesforce.com CRM and other USA cloud applications, but the BC FOIPPA Act does not allow it.
  • (Day 2) So, the BC CIO has recommended "tokenization" systems to make personal information 100% obscured before storage in USA cloud applications.
  • (Day 3) But, using truly secure tokenization renders CRMs basically useless, so software vendors are flogging less secure forms of tokenization hoping that people won't notice the reduced security levels because they still call it "tokenization".

The BC CIO guidance on using USA cloud services has a certain breathless enthusiasm (is there any innovation more exciting than vendor innovation?) for the tokenization products vendors are bringing to market:

Vendors have begun to address this “data-residency” issue in innovative ways. As an example, Force.com, and CypherCloud offer solutions that allow sensitive or personal information to remain in Canada. Using tokenization – a method of substituting specified data fields for arbitrary values – these solutions allow for the use of foreign-based services while remaining within the residency-based restrictions of FOIPPA.
BC OCIO, Data Residency and Tokenization

And the guidance released by BC's Office of the Information & Privacy Commissioner (OIPC) at first glance appears to similarly swallow claims about tokenization hook, line and sinker.

Public bodies may comply with FIPPA provided that the personal information is adequately tokenized and the crosswalk table is secured in Canada.
BC OIPC, Updated guidance on the storage of information outside of Canada by public bodies

However, the OIPC guidance has one small but important difference, the word "adequately".

I met with a lawyer from the OIPC's office to discuss tokenization, and he was clear that the OIPC understood the very important difference between fully randomized tokenization (basically unbreakable, and "adequate") and any other tokenization (potentially trivially breakable, and perhaps not "adequate"). This is reassuring, because the difference is not immediately obvious, and the tokenization software vendors are doing everything in their power to obscure the difference in their marketing materials.

It is not reassuring that the OIPC has opened the door to "tokenization" at all. The OIPC is sufficiently anal retentive about personal information that they have ruled that no forms of standard encryption are sufficiently secure to be used to store personal information outside Canada, because "encryption may be deciphered given sufficient computer analysis". That's right, the OIPC scoffs at your AES-256 encoded data, but is OK with "adequate" tokenization, for some undefined values of "adequate".

The OIPC guidance spends two paragraphs on "re-identification" of data (the practice of mixing tokenized and un-tokenized fields in records), and spends five more on the legal and physical security of the tokenization crosswalk table (dictionary), but spends only one word ("adequately") on whether or not the tokenization dictionary is full of junk.

The OIPC told me that, because fully random tokenization completely obscured the original data[1], they had to rule that fully tokenized personal data was no longer "personal information" and thus not covered by the Act. This strikes me as very lawyerly, but also very dangerous, since it opens the door for government to consider technical "tokenization" solutions from vendors that are likely far less secure than conventional approaches (like AES-256) that the OIPC has already rejected.

I'll close with the good news: all plans to store personal data outside Canada are still subject to case-by-case review by the OIPC, there is thus far no blanket approval for systems that claim they "tokenize", and the OIPC can still issue further guidance based on research that is going on right now. I'm not lighting my hair on fire, yet. But the door is cracked open, and the snake-oil salesmen are laying out their wares, let's keep an eye on them.

[1] Again, implementation matters. At a minimum, even completely random word-based tokenization can leak information about how many words are in each field. Some implementations also don't encode punctuation, so they leak symbols ("Smith & Wesson" becomes "faerqb & gabedfsara") and other non-word entities. Depending on the input data, these small leakages can be significant.

PostScript

In re-reading my series of posts, I think I have been overly harsh on the cloud security vendors, because there are really two questions here, which have very different answers:

  • Is less-than-perfect tokenization better than nothing? Yes, it's a lot better than nothing. Even with less-than-perfect tokenization, employees of the cloud software companies can't just casually read records in the database, and an entity wanting to break the security of the records would need to extract a pretty big corpus of records to analyze them to find information leaks and use them to break in.
  • Is less-than-perfect tokenization acceptable for BC? No, because of the FOIPPA law, and because the Commissioner has already set a very very very high bar by not allowing standard symmetric encryption (which can be very very secure) to be used to host personal data outside of Canada.

More on this tomorrow.
 

Wednesday, July 02, 2014

Tokenization and Your Private Data (3)

To recap:

  • (Day 1) The government is interested in using the salesforce.com CRM and other USA cloud applications, but the BC FOIPPA Act does not allow it,
  • (Day 2) So the BC CIO has recommended "tokenization" systems to make personal information 100% obscured before storage in USA cloud applications.

BUT, and it's a big BUT, storing securely tokenized data makes cloud applications mostly useless.

As we saw yesterday, secure tokenization replaces every input word with a completely random token. This is done in practice with a tokenization server that translates words to tokens and vice versa.

The tokenization server also has to translate user queries into tokenized equivalents. So if the user asked:

"Show me the record for 'paul' 'ramsey'"

The filter would translate it into this query for the server:

"Show me the record for 'rtah' 'hgat'"

Hm, the magic still seems to be working. But what about a search that returns more than one record?

"Show me all the records of people named 'Paul'"

This is harder. In a secure tokenization system, there's a unique token for every word ever stored, even the same word. So the tokenizer now has to ask:

"Show me all the records that have firstname 'rtah' or 'fasp'"

Our example has only two 'Paul's. Imagine this example with a database with 50 thousand 'Paul's. The query functionality would either not work or slow to a crawl. Can it be fixed? Sure!

We can fix the performance problem by just using the same token for every 'Paul' encountered by the system (and for every 'Jones', and so on).

Problem solved, now if I ask:

"Show me all the records of people named 'Paul'"

The filter can translate it simply into:

"Show me all the records that have firstname 'rtah'"

And no matter how many 'Paul's there are in the system it will work fine.

Just one (big) problem. Always substituting the same token for the same word turns "tokenization" from an uncrackable system into a trivial substitution cipher, like the ones you used in Grade 4 to write secret messages to your friends (only using words as the substitution elements instead of letters).

And things get even worse [1] as you add other, very common features people expect from their CRM software:

  • If you want to retrieve records in sorted order, then the tokens in the CRM must have the same sort order as the words they stand in for.
  • If you want to do substring matching ("give me all the names that start with 'p'") then the token internal structure must also reflect the internal structure of the original word.

None of this has stopped tokenization software vendors (like CipherCloud, one of the vendors being used by the BC government) from claiming to be able to both provide the magic unbreakability of tokenization while still supporting all the features of the backend CRM.

Cryptography buffs, interested in how CipherCloud could substantiate the claims it was making, started looking at the material it published in its manual and demonstrated at trade shows. Based on the publicly available material, one writer concluded:

The observed encryption has significant weaknesses, most of them inherent to a scheme that wants to encrypt data, while enabling the original application to perform operations such as search and sorting on the encrypted data without changing that application. There might be some advanced techniques (homomorphic encryption and the likes) that avoid these weaknesses, but at least the software demoed in the video does not use them.

In response, the company slapped their discussion site with a DMCA takedown order. This is not the action of a company that is confident in its methods.

Tomorrow, I'll look at what the Freedom of Information Commissioner has said about "tokenization" and where we are going from here.

[1] Yes, my equality example is very simplified for teaching purposes, and there are some papers out there on "fully homomorphic encryption", but note that FHE is still an area of research, and in any event (see tomorrow's post), wouldn't meet the BC Information Commissioner's standard for extra-territorial storage of personal information.

Tuesday, July 01, 2014

Tokenization and Your Private Data (2)

So, (Day 1) the BC government's vendors (and thus, by extension, the BC government) are hot to trot to use the salesforce.com cloud CRM to store the personal data of BC citizens. But, BC privacy law does not allow that. Whatever will the government do?

Enter stage left: "tokenization". The CIO has recommended tokenization technology for Ministries looking to use salesforce.com and other cloud services to manage private information:

Using tokenization – a method of substituting specified data fields for arbitrary values – these solutions allow for the use of foreign-based services while remaining within the residency-based restrictions of FOIPPA.
Bette-Jo Hughes, Oct 2, 2013

Tokenization is a strategy that takes every word in an input text, and replaces it with a random substitution "token", and keeps track of the relationship between words and tokens. So, the input to a tokenization process would be N words, and the output would be N random numbers, and an N-entry dictionary matching the words to the numbers that replaced them.

Crytography buffs will note that this is just a one-time pad, an old but unbreakable scheme for encoding messages, only operating word-by-word instead of letter-by-letter.

This seems like a nice trick!

InputDictionaryOutput
Paul Ramsey
Paul Jones
Tim Jones
Paul = rtah
Ramsey = hgat
Paul = fasp
Jones = nasd
Tim = yhav
Jones = imfa
rtah hgat
fasp nasd
yhav imfa

If you are clever, you can put a tokenizing filter between your users and American web sites like SF.com, and have the tokenizer replace the words you send to SF.com with tokens, and replace the tokens SF.com sends you with words. So the data at SF.com will be gobbledegook, but what you see on your screen will be words. Magic!

If all we wanted to do was just store data securely somewhere outside of Canada, and then get it back, "tokenization" would be a grand idea, but there's a hitch.

  • First, storing tokenized data means storing 3-times the volume of the original (one copy of tokens stored at salesforce.com, and a locally stored dictionary that contains both the original and the tokens). You get no benefit from the cloud from a storage standpoint (in fact it's worse, you're storing twice as much local data); and, you get no redundancy benefit, since if you lose your local copy of the dictionary the cloud data becomes meaningless.
  • Second, and most importantly this whole exercise isn't about storing data, it's about making use of a customer relationship management (CRM) system, salesforce.com, and secure tokenization, as described above, is not consistent with using salesforce.com effectively.

Tomorrow, we'll discuss why this most excellent "tokenization" magic doesn't work if you want to use it inside a CRM (or any other system that expects its data to have meaning).

Monday, June 30, 2014

Tokenization and Your Private Data (1)

One morning this winter, while I was sipping my coffee at the cafe below our office, a well-dressed man and woman sat down at the table next to me, and started talking. Turns out, they were my favourite kind of people — IT people! They were going to bid on the Integrated Decision Making project, and were talking about my favourite systems integrator, Deloitte.

"Is Deloitte trying to bring ICM and Siebel into this project?" she asked.
"No, not anymore" he replied "now they are really pushing SalesForce.com."

Now this was interesting! Chastened by their failure to shoehorn social services case management into a CRM, Deloitte has adroitly pivoted and is trying to shoehorn natural resource permitting into ... a cloud CRM.

(I should parenthetically point out that, unsurprisingly, the SALES people in our company find SALESforce.com very useful in coordinating and tracking their SALES activities.)

Certainly pushing a platform that is actually growing in usage makes more sense than pushing one that end-of-lifed a decade ago, but still, again with the CRM?

Deloitte isn't being coy with their plans, they are selling them to the highest levels of the government. On October 7, 2013, the BC CIO spent two and a half hours enjoying the hospitality of Deloitte and Salesforce.com at a "BC government executive luncheon" on the topic "Innovation, Transformation and Cloud Computing in the Public Sector”.

And there's another wrinkle. SF.com is a US-based cloud service provider, and our Freedom of Information and Protection of Privacy Act (FOIPPA) says that personal data must be stored in Canada. SF.com is also a US legal entity, which means they are subject to the PATRIOT Act which allows authorities to access personal data without notifying the subject of the search. That is also not allowed by BC's FOIPPA.

What is an ambitious system integrator with a hammer suitable for every nail to do? Not change hammers! That would be silly. Far better to try and get an exemption or figure out a workaround. Workarounds add nice juicy extra complexity to the hammer, which can only help billable hours.

More on the workaround, tomorrow.

Tuesday, June 24, 2014

Keynote @ FME User Conference

FME was one of the first geospatial tools I learned at the start of my career, back in the mid-90s, and getting invited to keynote the quintennial FME Users Conference this year was quite an honour, so I wrote up a special keynote just for them.

Monday, June 23, 2014

When is an IT project just an IT project?

And when is it something more?

Every year, I report on the progress of IT outsourcing in BC (news flash: it keeps going up, 2011, 2012, 2013) and marvel at the sums we lavish on international consultancies, fees that largely march offshore, generating no local innovation or economic growth.

Last fall, I came across a news release from the Ministry of Health, describing a $842 MILLION "Clinical and Systems Transformation Project". I now realize, I've not been tracking a significant seam of IT spending: the systems being commissioned by the five regional health authorities and their central services arm, the Provincial Health Services Authority.

Indeed, a quick perusal of the 2012/13 PHSA suppliers list shows a $50M spend on IBM, and an $11M spend on HP in just one year. That's enough to change my annual spending tracker quite a bit!

So, IBM won the new "Clinical and Systems Transformation Project", worth $842 MILLION over 10 years, I wonder what that RFP looked like? I asked for it, and was refused, so I FOI'ed it, and it came back. It's 500 pages long. Have a look.

Fun sidebar:
On page 186, in the "economic model" of the RFP, they direct that "proponents are to include 4% growth per year in infrastructure (e.g. storage capacity, network bandwidth, processing capacity, etc.) needs over the Term." Any readers see a problem modelling IT capacity requirements at 4% growth per year over 10 years? Hint: A 2003 iMac shipped with 256Mb RAM; a 2013 iMac ships with 8Gb RAM: that's 32 times more capacity. 4% compounding over 10 years generates only a 50% increase in capacity over a decade. Think those terms will need to be renegotiated?

It's a long read, but fortunately there's a really interesting bit right away, in the Mandatory Requirements:

Proponent is willing and able to transition any Public Sector union agreements relevant to the Managed Services to their organization, if required

Whoa! This isn't just an IT systems agreement after all, it's an outsourcing deal.

The government seems to have learned little from the experience of BC Hydro outsourcing to Accenture or Medical Services Plan to Maximus, or from reports by the Auditor General, or even their own consultants who reviewed outsourcing from 2001-2010 and noted that:

  • Contracts were structured towards a specific solution or specific outputs rather than a desired outcome
  • Contracts were negotiated in isolation gave the same scope of services to multiple vendors
  • The procurement process resulted in contracts that while defined, are no longer what is required
  • Risk transfer objectives were not met
  • There was no consolidated vendor management
  • There was no central management of the deals or the benefits achieved

The "Alternative Service Delivery Secretariat" wound down in 2010, but the government is still hard at it, now quietly preparing to outsource the clinical systems of three health authorities to IBM, for $84M a year over 10 years. Significant portions of critical government operations are being transferred beyond direct government control for very long periods of time.

Perhaps the managers who pushed this solution didn't trust their own staff, or themselves, to successfully bring an ambitious project to conclusion. They didn't want to "take the risk" so they took the "safe" option. They need to spend some time behind the velvet curtain in organizations like IBM or Accenture: the only results that matter to those organizations are the quarterly results. There will be some good people in them, and some bad ones, but the level of competence or capability won't be orders of magnitude better than you could build yourself in-house. And as organization, as corporations, they have only one bottom line, and it's theirs, not ours.

About Me

My Photo
Victoria, British Columbia, Canada

Followers

Blog Archive

Labels

bc (31) it (26) postgis (17) icm (10) sprint (9) enterprise IT (8) open source (8) osgeo (8) video (8) management (6) cio (5) enterprise (5) foippa (5) gis (5) spatial it (5) foi (4) mapserver (4) outsourcing (4) bcesis (3) foss4g (3) oracle (3) politics (3) architecture (2) boundless (2) esri (2) idm (2) natural resources (2) ogc (2) open data (2) opengeo (2) openstudent (2) postgresql (2) rant (2) technology (2) vendor (2) web (2) 1.4.0 (1) COTS (1) HR (1) access to information (1) accounting (1) agile (1) aspen (1) benchmark (1) buffer (1) build vs buy (1) business (1) business process (1) cathedral (1) cloud (1) code (1) common sense (1) consulting (1) contracting (1) core review (1) crm (1) custom (1) data warehouse (1) deloitte (1) design (1) digital (1) email (1) essentials (1) evil (1) exadata (1) fcuk (1) fgdb (1) fme (1) foocamp (1) foss4g2007 (1) ftp (1) gds (1) geocortex (1) geometry (1) geoserver (1) google (1) google earth (1) government (1) grass (1) hp (1) iaas (1) icio (1) industry (1) innovation (1) integrated case management (1) introversion (1) iso (1) isss (1) isvalid (1) javascript (1) jts (1) lawyers (1) mapping (1) mcfd (1) microsoft (1) mysql (1) new it (1) nosql (1) opengis (1) openlayers (1) oss (1) paas (1) pirates (1) policy (1) portal (1) proprietary software (1) qgis (1) rdbms (1) recursion (1) regression (1) rfc (1) right to information (1) saas (1) salesforce (1) sardonic (1) seibel (1) sermon (1) siebel (1) snark (1) spatial (1) standards (1) svr (1) tempest (1) texas (1) tired (1) transit (1) twitter (1) udig (1) uk (1) uk gds (1) verbal culture (1) victoria (1) waterfall (1) wfs (1) where (1) with recursive (1) wkb (1)