"Wanting it badly is not enough" could be the title of a postmortem on the century's tech-policy battles. Think of the crypto wars: yeah, it would be super cool if we had ciphers that worked perfectly except when "bad guys" used them, but that's not ever going to happen.
1/
Another area is anonymization of large data-sets. There are undeniably cool implications for a system that allows us to gather and analyze lots of data on how people interact with each other and their environments without compromising their privacy.
2/
But "cool" isn't the same as "possible" because wanting it badly is not enough. In the mid-2010s, privacy legislation started to gain real momentum, and privacy regulators found themselves called upon to craft compromises to pass important new privacy laws.
3/
Those compromises took the form of "anonymized data" carve-outs, leading to the passage of laws like the #GDPR, which strictly regulated processing "personally identifying information" but was a virtual free-for-all for "de-identified" data that had been "anonymized."
4/
There was just one teensy problem with this compromise: de-identifying data is REALLY hard, and it only gets harder over time. Say the NHS releases prescribing data: date, doctor, prescription, and a random identifier. That's a super-useful data-set for medical research.
5/
And say the next year, Addison-Lee or another large minicab company suffers a breach (no human language contains the phrase "as secure as minicab IT") that contains many of the patients' journeys that resulted in that prescription-writing.
6/
Merge those two data-sets and you re-identify many of the patients in the data. Subsequent releases and breaches compound the problem, and there's nothing the NHS can do to either predict or prevent a breach by a minicab company.
7/
Even if the NHS is confident in its anonymization, it can never be confident in the sturdiness of that anonymity over time.
Worse: the NHS really CAN'T be confident in its anonymization. Time and again, academics have shown that anonymized data from the start.
8/
Re-identification attacks are subtle, varied, and very, very hard to defend against:
When this pointed out to the (admittedly hard-working and torn) privacy regulators, they largely shrugged their shoulders and expressed a groundless faith that somehow this would be fixed in the future. Privacy should not be a faith-based initiative.
Today, we continue to see the planned releases of large datasets with assurances that they have been anonymized. It's common for terms of service to include your "consent" to have your data shared once it has been de-identified. This is a meaningless proposition.
11/
To show just how easy re-identification can be, researchers at Imperial College and the Université catholique de Louvain have released The Observatory of Anonymity, a web-app that shows you how easily you can be identified in a data-set.
Feed the app your country and region, birthdate, gender, employment and education status and it tells you how many people share those characteristics. For example, my identifiers boil down to a 1-in-3 chance of being identified.
13/
(Don't worry: all these calculations are done in your browser and the Observatory doesn't send any of your data to a server)
If anything, The Observatory is generous to anonymization proponents. "Anonymized" data often include identifiers like the first half of a post-code.
14/
You can read more about The Observatory's methods in the accompanying @nature paper, "Estimating the success of re-identifications in incomplete datasets using generative models."
ETA - If you'd like an unrolled version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
I am an environmentalist, but I'm not a climate activist. I used to be - I even used to ring strangers' doorbells on behalf of Greenpeace.
1/
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
But a quarter of a century ago, I fell in with the Electronic Frontier Foundation and became a lifelong digital rights activist, and switched to cheering on environmental activists from the sidelines of their fight:
Like you, I'm sick to the back teeth of talking about AI. Like you, I keep getting dragged into AI discussions. Unlike you‡, I spent the summer writing a book on why I'm sick of AI⹋, which @fsgbooks will publish in 2026.
‡probably
⹋"The Reverse Centaur's Guide to AI"
1/
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
A week ago, I turned that book into a speech, which I delivered as the annual Nordlander Memorial Lecture at Cornell, where I'm an AD White Professor-at-Large.
3/
Billionaires don't think we're real. How could they? How could you inflict the vast misery that generates billions while still feeling even a twinge of empathy for the sufferer in your extractive enterprise. No wonder Elon Musk calls us "NPCs":
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
Ever notice how people get palpably stupider as they gain riches and power? Musk went from a cringe doofus to a world-class credulous dolt, and it seems like he loses five IQ points for every $10b that's added to his net worth.
3/
I'm only a few chapters into Bill McKibben's stupendous new book *Here Comes the Sun: A Last Chance for the Climate and a Fresh Chance for Civilization* and I already know it's going to change my outlook forever:
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
McKibben is one of our preeminent climate writers and activists, noteworthy for his informed and brilliant explanations of the technical limits - and possibilities - of various climate interventions, and for his lifelong organizing work.
3/
One of the dumbest, shrewdest tricks corporate America ever pulled was teaching us all to reflexively say, "If a corporation blocks your speech, that doesn't violate the First Amendment and therefore it's not censorship":
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
Censorship isn't limited to government action: it's the act of preventing a message from a willing speaker from reaching a willing listener. The fact that it's censorship doesn't (necessarily) mean that it's illegitimate or bad.
3/
Conspiratorialism is downstream of the trauma of institutional failures.
Insitutional failures are downstream of regulatory capture.
Regulatory capture is downstream of monopolization.
Monopolization is downstream of the failure to enforce antitrust law.
1/
If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:
Start with conspiratorialism and trauma. I am staunchly pro-vaccine. I have had so many covid jabs that I glow in the dark and can get impeccable 5g reception at the bottom of a coal-mine.