Based on some discussions yesterday, I wrote up a more detailed note on the Apple on-device scanning saga with a focus on the "obfuscation" of the exact number of matches and dived into how one might (probabilistically) break it.
This isn't the biggest problem with the proposed system. It does however suggest that even if you *really* trust Apple to not abuse their power (or be abused by power) then Apple still needs to release details about system parameters and assumptions.
We can quibble about the exact numbers I used, and the likelihood of the existence of a "prolific parent account" taking 50 photos a day for an entire year but there are *real* bounds on the kinds of users any static threshold/synthetic parameters can sustain.
And if Apple is generating account-dependent parameters, then the system is even more broken.
i.e. I assert that the actual privacy of this metadata is paradoxically dependent on both Apple *never* deriving certain information AND on them *always* deriving it for every account.
Anyway, now the math I had in my head during Saturday nights stream-of-tweets is out of my head and nicely formatted.
In order to sort through this whole thing Apple should release:
- the threshold of matches required for human review
- the mechanism through which the probability of synthetic matches is derived and whether it is global or per account
If those were public then it would be possible to plug in the numbers and determine exactly what the bounds are for effective obfuscation and we could have an actual conversation about how private the metadata in the system actually is.
To everyone who tried to read this on a mobile screen and failed: Sorry about that, I threw together this theme last night and hadn't given it a full run through.
I pushed an update to the css which should make it nicer on smaller screens, please let me know if it is better!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
As an appendix/follow up to my previous article (a probabilistic analysis of the high level operation of a system like the one that Apple has proposed) here are some thoughts / notes / analysis of the actual protocol.
Honestly I think the weirdest thing given the intent of this system is how susceptible this protocol seems to be to malicious clients who can easily make the server do extra work, and can probably also just legitimately DoS the human-check with enough contrived matches.
Daily Affirmation: End to end encryption provides some safety, but it doesn't go far enough.
For decades our tools have failed to combat bulk metadata surveillance, it's time to push forward and support radical privacy initiatives.
Watching actual cryptographers debate about whether or not we should be voluntarily *weakening* encryption instead of radically strengthening threat models makes my skin crawl.
I don't think I can say this enough right? Some of you are under the weird impressions that systems are "too secure for the general public to be allowed access to" and it just constantly blows my fucking mind.
Also has anyone else attempted to reverse engineer how Apple might have arrived at 1/trillion probability of false account flagging?
Some back of the napkin math, please double check...
If you assume the threshold is >10 false positives over a year to trigger an account (thrown around in the Apple docs), and each person stores ~1024 new photos per year (~3-4/day) then to get a 1/trillion figure your single-instance false positive probability has to be ~1/2000
You can get that probability if you assume the database being checked against contains ~16M unique hashes and the effective hash size is ~36bit (Neuralhash hashes appear to be 128 bit, but they are perceptual not random)
Neither of those values seems absurd given what we know.
These are fair question regarding systems like the one Apple has proposed, and there is enough general ignorance regarding some of the building blocks that I think it is worth attempting to answer.
But it's going to take way more than a few tweets, so settle in...
First, I'll be incredibly fair to Apple and assume that the system has no bugs - that is there is no way for a malicious actor inside of outside of Apple to exploit the system in ways that it wasn't meant to be exploited.
Idealized constructions only.
At the highest level there is your phone and Apple's servers. Apple has a collection of hashes, and your phone has...well tbh if you are like a large number of people in the world it probably has links to your entire digital life.
As I have said before, I am willing to be the person who draws a line here, against the calls for "nuance".
There is no room for nuance, because nuance thinks surveillance systems can be built such that they can only used only for good or to only target bad people.
It is our duty to oppose all such system *before* they become entrenched!
Not to work out how to entrench them with the least possible public outrage at their very existence by shielding their true nature with a sprinkling of mathematics.
Really is disappointing how many high profile cryptographers actually seem to believe that "privacy preserving" surveillance is not only possible (it's not) - but also somehow "not surveillance" (it is).
Meanwhile Apple are making statements to the press with the effective of "We are not scanning peoples photos for illegal material, we are hashing peoples photos and *using cryptography* to compare them to illegal material"
As if those aren't the *EXACT SAME THING*.
It's very important to focus on the principles involved here and not the mechanism. Just because you use cryptography to alter the thing you are surveillance doesn't make it not-surveillance.