Tokenization and Your Private Data (2)

So, (Day 1) the BC government’s vendors (and thus, by extension, the BC government) are hot to trot to use the salesforce.com cloud CRM to store the personal data of BC citizens. But, BC privacy law does not allow that. Whatever will the government do?

Enter stage left: “tokenization”. The CIO has recommended tokenization technology for Ministries looking to use salesforce.com and other cloud services to manage private information:

Using tokenization – a method of substituting specified data fields for arbitrary values – these solutions allow for the use of foreign-based services while remaining within the residency-based restrictions of FOIPPA.
Bette-Jo Hughes, Oct 2, 2013

Tokenization is a strategy that takes every word in an input text, and replaces it with a random substitution “token”, and keeps track of the relationship between words and tokens. So, the input to a tokenization process would be N words, and the output would be N random numbers, and an N-entry dictionary matching the words to the numbers that replaced them.

Crytography buffs will note that this is just a one-time pad, an old but unbreakable scheme for encoding messages, only operating word-by-word instead of letter-by-letter.

This seems like a nice trick!

Input Dictionary Output
Paul Ramsey
Paul Jones
Tim Jones
Paul = rtah
Ramsey = hgat
Paul = fasp
Jones = nasd
Tim = yhav
Jones = imfa
rtah hgat
fasp nasd
yhav imfa

If you are clever, you can put a tokenizing filter between your users and American web sites like SF.com, and have the tokenizer replace the words you send to SF.com with tokens, and replace the tokens SF.com sends you with words. So the data at SF.com will be gobbledegook, but what you see on your screen will be words. Magic!

If all we wanted to do was just store data securely somewhere outside of Canada, and then get it back, “tokenization” would be a grand idea, but there’s a hitch.

Tomorrow, we’ll discuss why this most excellent “tokenization” magic doesn’t work if you want to use it inside a CRM (or any other system that expects its data to have meaning).