Tokenization and Your Private Data (4)

Recapping:

The BC CIO guidance on using USA cloud services has a certain breathless enthusiasm (is there any innovation more exciting than vendor innovation?) for the tokenization products vendors are bringing to market:

Vendors have begun to address this “data-residency” issue in innovative ways. As an example, Force.com, and CypherCloud offer solutions that allow sensitive or personal information to remain in Canada. Using tokenization – a method of substituting specified data fields for arbitrary values – these solutions allow for the use of foreign-based services while remaining within the residency-based restrictions of FOIPPA.
– BC OCIO, Data Residency and Tokenization

And the guidance released by BC’s Office of the Information & Privacy Commissioner (OIPC) at first glance appears to similarly swallow claims about tokenization hook, line and sinker.

Public bodies may comply with FIPPA provided that the personal information is adequately tokenized and the crosswalk table is secured in Canada.
– BC OIPC, Updated guidance on the storage of information outside of Canada by public bodies

However, the OIPC guidance has one small but important difference, the word “adequately”.

I met with a lawyer from the OIPC’s office to discuss tokenization, and he was clear that the OIPC understood the very important difference between fully randomized tokenization (basically unbreakable, and “adequate”) and any other tokenization (potentially trivially breakable, and perhaps not “adequate”). This is reassuring, because the difference is not immediately obvious, and the tokenization software vendors are doing everything in their power to obscure the difference in their marketing materials.

It is not reassuring that the OIPC has opened the door to “tokenization” at all. The OIPC is sufficiently anal retentive about personal information that they have ruled that no forms of standard encryption are sufficiently secure to be used to store personal information outside Canada, because “encryption may be deciphered given sufficient computer analysis”. That’s right, the OIPC scoffs at your AES-256 encoded data, but is OK with “adequate” tokenization, for some undefined values of “adequate”.

The OIPC guidance spends two paragraphs on “re-identification” of data (the practice of mixing tokenized and un-tokenized fields in records), and spends five more on the legal and physical security of the tokenization crosswalk table (dictionary), but spends only one word (“adequately”) on whether or not the tokenization dictionary is full of junk.

The OIPC told me that, because fully random tokenization completely obscured the original data[1], they had to rule that fully tokenized personal data was no longer “personal information” and thus not covered by the Act. This strikes me as very lawyerly, but also very dangerous, since it opens the door for government to consider technical “tokenization” solutions from vendors that are likely far less secure than conventional approaches (like AES-256) that the OIPC has already rejected.

I’ll close with the good news: all plans to store personal data outside Canada are still subject to case-by-case review by the OIPC, there is thus far no blanket approval for systems that claim they “tokenize”, and the OIPC can still issue further guidance based on research that is going on right now. I’m not lighting my hair on fire, yet. But the door is cracked open, and the snake-oil salesmen are laying out their wares, let’s keep an eye on them.

[1] Again, implementation matters. At a minimum, even completely random word-based tokenization can leak information about how many words are in each field. Some implementations also don’t encode punctuation, so they leak symbols (“Smith & Wesson” becomes “faerqb & gabedfsara”) and other non-word entities. Depending on the input data, these small leakages can be significant.

PostScript

In re-reading my series of posts, I think I have been overly harsh on the cloud security vendors, because there are really two questions here, which have very different answers:

More on this tomorrow.