When a business or government reassures you that the data they share is “anonymous”, that anything that could identify you has been removed, just laugh. Or cry. “Many current datasets can be re-identified with no more than basic programming and statistics skills,” Princeton University researchers warned in June.
Take, for example, New York’s supposedly anonymous database of all 174 million taxi rides taken in 2013, which was made public after a freedom-of-information request. Software engineer Vijay Pandurangan re-identified the drivers and their plate numbers using just a few hours of computer time — in part because the original de-identification was done poorly. Then Anthony Tockar of Neustar Research dug out photos of celebrities getting into taxis where the licence plate was visible, cross-referenced them with the taxi data, and revealed where those celebrities had gone next.
It’s not just celebrities whose privacy is at risk. Tockar could also figure out the home addresses of frequent visitors to Larry Flynt’s Hustler Club. From there, it would be easy enough to cross-reference those addresses with property records, voter registrations and other public information to get names.
“Holy shit, can you imagine someone just plotting all the trips from a single gay bar? Listing off all the connected residential addresses? And not only that, any subsequent trips home from those addresses the next morning? Taking the walk of shame to a whole new level!” wrote user ‘abalone’ at Hacker News.
“Likewise trips could be used to deduce affairs and other deceptions by fellow residents. ‘You said you were working late, but the only taxi trip to our building that night was from a bar.'”
Location data is particularly revealing. Our smartphones are effectively tracking devices. That’s why law enforcement and intelligence agencies are so keen to access this telecommunications metadata.
“As most people spend the majority of their time at either their home or workplace, an adversary who knows those two locations for a user is likely to be able to identify the trace for that user — and to confirm it based on the patterns of movement,” the Princeton researchers wrote.
“It’s not just political rivals or disgruntled ex-partners who’d be interested. Insurance companies and credit providers are always on the lookout for indications of risk.”
It’s easy. According to research by Yves-Alexandre de Montjoye and others, more than 50% of mobile phone users can be identified from just two randomly chosen location data points. With four points, the figure rises to 95%. Most people reveal vastly more than that through social media — either by stating their location directly, or giving it away indirectly by posting photos of what they see.
It gets worse.
“Many de-identified datasets are vulnerable to re-identification by adversaries who have specific knowledge about their targets. A political rival, an ex-spouse, a neighbour, or an investigator could have or gather sufficient information to make re-identification possible,” the Princeton researchers wrote.
“As more datasets become publicly available or accessible by (or through) data brokers, the problems with targeted attacks can spread to become broad attacks. One could chain together multiple datasets to a non-anonymous dataset and re-identify individuals present in those combinations of datasets.”
Remember, it’s not just the telco that has your location data. Potentially, it’s any company whose app you’ve downloaded to your phone, any advertising broker they use, and any company they’re on-selling that data to — even if it has supposedly been made anonymous. Perhaps you gave one of them permission to share that data with their “partners”? You did read the privacy policy, right?
Here’s the key policy problem.
Most privacy law, including the US, is based on the concept of protecting personally identifiable information (PII). Definitions vary, but the US National Institute of Standards and Technology (NIST) is typical: “any information about an individual … including (1) any information that can be used to distinguish or trace an individual’s identity, such as name, social security number, date and place of birth, mother’s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information”.
Australia’s updated Privacy Act, which came into force on March 12, broadens the definition of personal information to include information where an individual’s identity is “apparent, or can reasonably be ascertained”.
But as the research is demonstrating, individuals’ identity can be “reasonably ascertained” from all manner of data with ever-decreasing effort — perhaps not from one dataset, but certainly by cross-referencing it with others.
It’s not just political rivals or disgruntled ex-partners who’d be interested. Insurance companies and credit providers are always on the lookout for indications of risk.
New Zealand’s privacy commissioner has floated the idea of making the re-identification of anonymised data illegal. Perhaps he’s onto something.
Hiding information requires cryptography and computer security.
Cryptography is hard to do. It is even harder to do it right.
Security is like all engineering: “Cheap, fast or correct: choose two (and often only one)”
That is fascinating, I was also reading this article on how hackers managed to breach target
http://www.networkworld.com/article/2600805/security0/11-steps-attackers-took-to-crack-target.html?page=2
While in the past (at times I wasn’t engaged in relevant activities but happened to be associated with those who were) I was even barred from a lecturing position despite the relevant ASIO Officer pointing out it was an absurd decision. There were Australians plotting against Australia’s best interests; and for a government to not take the steps they did would have been indefensible.
Threats to Australians are far greater these days, and the hubris of Latter Day ‘Noble’ Defenders of our rights can pose some of our more dangerous, if well-intentioned, threats.
Norman, are you really surprised that the well intentioned hubris of Latter Day ‘Noble’ Defenders of our rights are an equal and opposite action to the hubris of Latter Day ‘NeoCon’ Aggressors of our rights with undisclosed and therefore debatable intentions?
You would be enlightened by reading the transcript of the Senate Comittee hearing into the Comprehensive revision of the Telecommunications (Interception and Access) Act 1979 dated the 29th of July this year.
It may even convince you that you’d rather be a citizen than a suspect.
Neutral, 7 decades back as a youngster listening to various faiths such as devout R.Cs and equally devout Communists I was surprised by their confidence that they were both correct and noble, while the others were wrong and deliberately evil. It no longer surprises me nor do I have difficulty observing similar problems with Latter Day True Believers who ‘think’ similarly.
Nor am I surprised that the hubris of those ‘interpreting’ Enquiries results in them not following the implications in those Enquiries. It’s a feature of human nature which is both difficult and painful to control. An excellent example of this was the famous Queensland Deaths in Custody Enquiry where despite the Magistrate who conducted it time and again telling us it had NOT shown indigenous prisoners were more likely to have died in custody than was the case with other inmates, few people ever either absorbed what he’d said or if they did absorb it, made no effort to publicise what was actually in that Report.
As for your “rather be a citizen than a suspect” comment, you do realise one can be both, and it’s not a matter of choosing which you want to be?
You would be enlightened by reading the be a citizen than a suspect.