Also has anyone else attempted to reverse engineer how Apple might have arrived at 1/trillion probability of false account flagging?
Some back of the napkin math, please double check...
If you assume the threshold is >10 false positives over a year to trigger an account (thrown around in the Apple docs), and each person stores ~1024 new photos per year (~3-4/day) then to get a 1/trillion figure your single-instance false positive probability has to be ~1/2000
You can get that probability if you assume the database being checked against contains ~16M unique hashes and the effective hash size is ~36bit (Neuralhash hashes appear to be 128 bit, but they are perceptual not random)
Neither of those values seems absurd given what we know.
Which would mean that the actual probability of a single false positive match for someone who takes ~3-4 photos a day over the course of a year would be ~40%.
Could be that Apple picked 1/trillion because it sounds very large and looks better in print, but it would be really nice to be able to verify that by knowing what the proposed threshold value is, and what the actual expected false positive rate is.
And yeah that is the average case...if you take a lot of photos, say you are a new parent documenting your kid taking 50 photos a day then your probability of getting >10 false positives over a year would be ~30% based on the guesstimates above
That is on the extreme end but is in no way completely outside of reality.
The only way to make that not awful would be to assume that by "1 in a trillion" Apple actually means a much, much larger number.
This math also assume that events are independent which I don't think is an actual valid assumption given that perceptual hashes are by-definition not independent (similar-but-different images will result in similar-but different hashes)
Would be great if someone could independently derive their own guesstimates for this to check my math.
Also if you assume these numbers are anywhere near accurate then you have to come the conclusion that any obfuscation of the matching count needs account-specific parametrization otherwise it quickly starts leaking distinguishing bits (e.g. matches > threshold w/out decryption)
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Based on some discussions yesterday, I wrote up a more detailed note on the Apple on-device scanning saga with a focus on the "obfuscation" of the exact number of matches and dived into how one might (probabilistically) break it.
This isn't the biggest problem with the proposed system. It does however suggest that even if you *really* trust Apple to not abuse their power (or be abused by power) then Apple still needs to release details about system parameters and assumptions.
We can quibble about the exact numbers I used, and the likelihood of the existence of a "prolific parent account" taking 50 photos a day for an entire year but there are *real* bounds on the kinds of users any static threshold/synthetic parameters can sustain.
These are fair question regarding systems like the one Apple has proposed, and there is enough general ignorance regarding some of the building blocks that I think it is worth attempting to answer.
But it's going to take way more than a few tweets, so settle in...
First, I'll be incredibly fair to Apple and assume that the system has no bugs - that is there is no way for a malicious actor inside of outside of Apple to exploit the system in ways that it wasn't meant to be exploited.
Idealized constructions only.
At the highest level there is your phone and Apple's servers. Apple has a collection of hashes, and your phone has...well tbh if you are like a large number of people in the world it probably has links to your entire digital life.
As I have said before, I am willing to be the person who draws a line here, against the calls for "nuance".
There is no room for nuance, because nuance thinks surveillance systems can be built such that they can only used only for good or to only target bad people.
It is our duty to oppose all such system *before* they become entrenched!
Not to work out how to entrench them with the least possible public outrage at their very existence by shielding their true nature with a sprinkling of mathematics.
Really is disappointing how many high profile cryptographers actually seem to believe that "privacy preserving" surveillance is not only possible (it's not) - but also somehow "not surveillance" (it is).
Meanwhile Apple are making statements to the press with the effective of "We are not scanning peoples photos for illegal material, we are hashing peoples photos and *using cryptography* to compare them to illegal material"
As if those aren't the *EXACT SAME THING*.
It's very important to focus on the principles involved here and not the mechanism. Just because you use cryptography to alter the thing you are surveillance doesn't make it not-surveillance.
Clearly a rubicon moment for privacy and end-to-end encryption.
I worry if Apple faces anything other than existential annihilation for proposing continual surveillance of private messages then it won't be long before other providers feel the pressure to do the same.
You can wrap that surveillance in any number of layers of cryptography to try and make it palatable, the end result is the same.
Everyone on that platform is treated as a potential criminal, subject to continual algorithmic surveillance without warrant or cause.
If Apple are successful in introducing this, how long do you think it will be before the same is expected of other providers? Before walled-garden prohibit apps that don't do it? Before it is enshrined in law?
"Stop using encryption so we can can check your messages for criminal activity" becomes "Allow us to scan all the files on your computer for criminal activity"
I'm so tired. Maybe let's not do the dystopia of corporations building cop bots into general purpose computers.
At some point protecting your privacy is going to boil down to not using devices that actively spy on you. No amount of overlay software can protect you from making a bad choice there.
Don't support corps that scan your computer for crimes seems pretty fucking basic.