The BC government’s email retention policy (delete it all, whenever possible) was briefly back in the news last week as a BC Liberal staffer was brought up on charges:
George Steven Gretes has been charged with two counts of wilfully making false statements to mislead, or attempt to mislead, under the province’s Freedom of Information and Protection of Privacy Act. — CBC News
Sometimes, taking one for the team really means taking one for the team. But it’s important to remember that, however personally reprehensible Gretes’ actions were, his behaviour is just the tip of the ethical iceberg when it come to the current governments’ attitude towards record keeping.
It has been obvious for years that the government has a deliberate policy of poor management of digital records, and that there is a strong desire in high places to keep that policy in place. Gretes is not an isolated figure, he’s just the only person foolish and unlucky enough to be caught in deliberate law-breaking, instead of quietly taunting the public from the grey area.
Right in the middle of the exposure of Mr. Gretes’ actions last fall, the Opposition brought forward more evidence that political staffers routinely destroy records:
What we did last November is we asked for information pertaining to any e-mails from the chief of staff to the Minister of Natural Gas, Tobie Myers. Ms. Myers, of course, at the time we asked, was in discussions with people within the sector about legislation that was going to be before this House. What we got back from that request for information over a three-week period were three e-mails, just three.
It was curious to us that there would only be three e-mails in existence coming from the minister’s office over a three-week period, when flagship legislation was being tabled. So we asked for the message-tracking documents from the Minister of Citizens’ Services.
We determined through that route … that Ms. Myers sent 800 e-mails over that three-week period. So 797 triple deletes is a whole lot of triple deletes.
In those 800 e-mails, there were e-mails sent to Mr. Spencer Sproule, who may be familiar to members on this side. He used to work in the Premier’s office as her issue management director. He now, of course, is the chief spokesperson for Petronas, the lead agency looking at natural gas here in British Columbia.
Jared Kuehl, the head deputy of government relations at Shell; Neil Mackie, from AltaGas; and right to the minister’s office in Ottawa — 800 e-mails, and we got three.
My question is to the minister of openness and transparency in B.C. Liberal–land. Can he explain how it is that when we asked for 800, we only got three?
— John Horgan, Leader of the Opposition, Oct 26, 2015
The usual excuses are rolled out every time: the emails are “transitory” or they are filed “elsewhere”. Except these emails were from a high ranking Natural Gas ministry staffer to highly placed members of the industry! Transitory? Really? These were all just making plans to get coffee, 800 times?
The Information and Privacy Commissioner noted the same pattern in records keeping by the Premier’s Deputy Chief of Staff.
The Deputy Chief of Staff stated that her practice was to delete emails from her Sent Items folder on a daily basis and if all emails in that folder were of a transitory nature, she would delete all of them. Her evidence was that her Deleted Items folder was set to purge at the end of each day when she exited Microsoft Outlook.
— p50, Access Denied, OIPC, Oct 22, 2015
George Gretes must be quietly eating his liver, to be prosecuted for actions that he knows his former colleagues engage in every single day in their offices at the highest levels of government.
But that’s what it means to be the fall guy. The public thinks little of you because you were venal enough to do the crime. Your former friends think litle of you because you were stupid enough to get caught.
“It’s done. Now you don’t have to worry about it anymore.”
— George Gretes, on illegally deleting the target of an FOI request
Dealing with addresses is a common problem in information systems: people live and work in buildings which are addressed using “standard” postal systems. The trouble is, the postal address systems and the way people use them aren’t really all that standard.
Postal addressing systems are just standard enough that your average programmer can whip up a script to handle 80% of the cases correctly. Or a good programmer can handle 90%. Which just leaves all the rest of the cases. And also all the cases from countries where the programmer doesn’t live.
Counterexample: 8 Seven Gardens Burgh, WOODBRIDGE, IP13 6SU (pointed out by Raphael Mankin)
Most solutions to address parsing and normalization have used rules, hand-coded by programmers. These solutions can take years to write, as special cases are handled as they are uncovered, and are generally restricted in the language/country domains they cover.
There’s now an open source, empirical approach to address parsing and normalization: libpostal.
Libpostal is built using machine learning techniques on top of Open Street Map input data to produce parsed and normalized addresses from arbitrary input strings. It has binding for lots of languages: Perl, PHP, Python, Ruby and more.
And now, it also has a binding for PostgreSQL: pgsql-postal.
You can do the same things with the PostgreSQL binding as you can with the other languages: convert raw strings into normalized or parsed addresses. The normalization function returns an array of possible normalized forms:
SELECTunnest(postal_normalize('412 first ave, victoria, bc'));
unnest
------------------------------------------
412 1st avenue victoria british columbia
412 1st avenue victoria bc
(2 rows)
The parsing function returns a jsonb object holding the various parse components:
SELECTpostal_parse('412 first ave, victoria, bc');
The core library is very fast once it has been initialized, and the binding has been shown to be acceptably fast, despite some unfortunate implementation tradeoffs.
@pwramsey parsed and normalized 1.2 million rows in five minutes. *does happy dance*
In 1790, Jeremy Bentham published his plans for the “panopticon”, his design for a new kind of prison that would leverage surveillance to ensure well-behaved prisoners. The panopticon was laid out so that all cells were visible from a central monitoring station, from which a hidden guard could easily watch any inmate, without being observed himself.
Bentham expected the panopticon approach to improve inmate behaviour at low cost: inmates would obey the rules because they could never be certain when the observer was watching them or not.
The developed world is rapidly turning into a digital panopticon.
Your trail through digital media is 100% tracked.
Every web site you visit is traceable, via ad cookies or “like” badges, or Google analytics.
Every web search is tracked via cookies or even your own Google login info.
In some respects this tracking is still “opt in” since it is possible, if undesirable, to opt out of digital culture. Drop your email, eschew the web, leave behind your smart phone.
But your trail in the physical work is increasingly being tracked too.
If you carry a cell phone, your location is known to within one mobile “cell”, as long as the device is powered.
If you use a credit or debit card, or an ATM, you are localised to a particular point of sale when you make a purchase.
If you use a car “safety” system like OnStar your location is known while you drive.
Again, these are active signals, and you could opt out. No cell phone, cash only, no vehicles after 1995.
Within our lifetimes, most urban areas will be under continuous video surveillance, and more importantly,
within our lifetimes, the computational power and algorithms to make sense of all those video feeds in real time will be available.
We take for granted that we have some moments of privacy. When I leave the house and walk to pick up my son at school, for 15 minutes, nobody knows where I am. Not for much longer. All too soon, it will be possible for someone in a data center to say “show me Paul” and get a live picture of me, wherever I may be. A camera will see me, and a computer will identify me in the video stream: there he is.
Speculative fiction is a wonderful thing, and there’s a couple books I read in the last year that are worth picking up for anyone interested in what life in the panopticon might be like.
Rainbows End by Verner Vinge (2007) is an exploration of the upcoming world of augmented reality and connectivity. In many ways Vinge leaps right over the period of privacy loss: his characters have already come to terms with a world of continuous visibility.
The Circle by David Eggers (2013) jumps into a world right on the cusp of the transition from our current “opt-in” world of partial privacy to one of total transparency, of life in the panopticon.
Both books are good reads, and tightly written, though in the tradition of science fiction the characterizations tend to be a little flat.
The Guardian also has a recent (2015) take on the digital panopticon:
In many ways, the watchtower at the heart of the panopticon is a precursor to the cameras fastened to our buildings – purposely visible machines with human eyes hidden from view.
Once you come to terms with the idea that, at any time, you could be surveilled, the next question is: does that knowledge alter your behaviour? Are you ready for the panopticon? I’m not sure I am.
At the best of times, I find it hard to generate a lot of
sympathy for my work-from-home lifestyle as an international
coder-of-mystery. However, the last few weeks have been especially
difficult, as I try to explain my week-long business trip
to Paris, France to participate in an annual OSGeo Code Sprint.
Yes, really, I “had” to go to Paris for my work. Please,
stop sobbing. Oh, that was light jealous retching? Sorry about
that.
Anyhow, my (lovely, wonderful, superterrific) employer, CartoDB
was an event sponsor, and sent me and my co-worker Paul
Norman to the event, which we
attended with about 40 other hackers on
PDAL, GDAL, PostGIS,
MapServer, QGIS, Proj4,
PgPointCloud etc.
Paul Norman got set up to do PostGIS development and crunched through a
number of feature enhancements. The feature enhancement ideas were
courtesy of Remi Cura, who brought in some great power-user ideas for
making the functions more useful. As developers, it is frequently hard
to distinguish between features that are interesting to us and features
that are using to others so having feedback from folks like
Remi is invaluable.
The Oslandia team was there in force, naturally, as they were the
organizers. Because they work a lot in the 3D/CGAL
space, they were interested in making CGAL faster, which meant they were
interested in some “expanded object header” experiments I did last
month. Basically the EOH code allows you to return an unserialized
reference to a geometry on return from a function, instead of a flat
serialiation, so that calls that look like
ST_Function(ST_Function(ST_Function())) don’t end up with a chain of
three serialize/deserialize steps in them. When the deserialize step
is expensive (as it in for their 3D objects) the benefit of this
approach is actually measureable. For most other cases it’s not.
(The exception is in things like mutators, called from within
PL/PgSQL, for example doing array appends or insertions in a tight
loop. Tom Lane wrote up this enhancement of PgSQL with examples for
array manipulation and did find big improvements for that narrow use
case. So we could make things like ST_SetPoint() called within
PL/PgSQL much faster with this
approach, but for most other operations probably the overhead of
allocating our objects isn’t all that high to make it worthwhile.)
There was also a team from
Dalibo and
2nd Quadrant. They worked on a
binding for geometry
to the
BRIN indexes (9.5+).
I was pretty sceptical, since BRIN indexes require useful ordering, and
spatial data is not necessarily well ordered, unlike something like
time, for example. However, they got a prototype working, and showed
the usual good BRIN properties: indexes were extremely small and
extremely cheap to build. For narrow range queries, they were about 5x
slower than GIST-rtree, however, the differences were on the order of
25ms vs 5ms, so not completely unusable. They managed this result with
presorted data, and with some data in its “natural” order, which
worked because the “natural” order of GIS data is often fairly
spatially autocorrelated.
I personally thought I would work on merging the back-log of GitHub
pull-requests that have built up on the
PostGIS git mirror,
and did manage to merge
several, both new ones from Remi’s group and some old ones. I merged
in my
ST_ClusterKMeans()
clustering function, and Dan Baston merged in his
ST_ClusterDBSCAN()
one, so PostGIS 2.3 will have a couple of new clustering implementations.
However, in the end I spent probably 70% of my time
on a blocker in 2.2,
which was related to upgrade. Because the bug manifests during
upgrade, when there are two copies of the postgis.so library floating
in memory, and because it only showed up on particular Ubuntu
platforms, it was hard to debug: but in the end we found the problem
and put in a fix, so we are once again able to do upgrades on all
platforms.
The other projects also got lots done, and there are more
write-ups at the event feedback page.
Thanks to Olivier Courtin
from Oslandia for taking on the heavy weight of
organizing such an amazing event!
I gave this talk in December, at the CartoDB 2015 partners conference, at the galactic headquarters in glamorous Bushwick, Brooklyn. A bit of a late posting, but hopefully I can still sneak under the “new year predictions bar”.