The attack improvements come from considering temporal relationships (the probability of receiving messages over a given threshold in a period of time) instead of just over the lifetime of the system.
This can be devastating if false positive rates are poorly selected.
Basically FMD schemes permit anyone to efficiently forge tags that 100% match multiple users.
I recently release an update to fuzzytags that makes use of avx2 speedups in dalek ristretto to allow a consumer desktop to produce a completely entangled tag for 2 parties in ~79 seconds:
But, importantly, under the FMD threat model the routing server can only perform attacks given information about false positive rates well below 2^24 which means that you can partially entangle a tag to multiple parties that the server cannot distinguish.
And further you can do this both altruistically (to hide you are sending a message to someone by also entangling it to someone else), and maliciously (to implicate someone else in a deniable way).
I'm currently working on a project called Niwl which is best described as a mixnet design that makes heavy use of fuzzy message detection with entangled tags to improve both decentralization and auditability.
Basically by adding mix nodes to an FMD scheme you can allow those nodes to take on the bandwidth-heavy and altruistic anonymity functions to provide for bandwidth-lite clients...
...those clients can, in addition, make use of entangling to check that mix nodes are acting honestly without adding additional traffic to the network (by tagging some messages to their contact AND themselves)
There are a couple of other neat tricks you can do as well, like entangle a tag to both a well known mix node AND a contact. Or entangle tag a message to two different mix nodes.
I'll add that all this comes with a very large hic sunt dracones warning - all of this is an experimental design that requires more analysis and testing.
I think the main takeaway is that there hasn't been enough push back and that this now seems depressingly inevitable.
I expect we will see more calls for surveillance like this in the coming months heavily remixed into the ongoing "online harms" narrative.
Without a strong stance from other tech companies, in particular device manufacturers and OS developers, we will look back on the last few weeks as the beginning of the end of generally available consumer devices that don't conduct constant algorithmic surveillance.
Someone asked me on a reddit thread the other day what value t would have to be if NeuralHash had a similar false acceptance rate to other perceptual hashes and I ball parked it at between 20-60...so yeah.
Some quick calculations with the new numbers:
3-4 photos/day: 1 match every 286 days.
50 photos/day: 1 match every 20 days.
As an appendix/follow up to my previous article (a probabilistic analysis of the high level operation of a system like the one that Apple has proposed) here are some thoughts / notes / analysis of the actual protocol.
Honestly I think the weirdest thing given the intent of this system is how susceptible this protocol seems to be to malicious clients who can easily make the server do extra work, and can probably also just legitimately DoS the human-check with enough contrived matches.
Daily Affirmation: End to end encryption provides some safety, but it doesn't go far enough.
For decades our tools have failed to combat bulk metadata surveillance, it's time to push forward and support radical privacy initiatives.
Watching actual cryptographers debate about whether or not we should be voluntarily *weakening* encryption instead of radically strengthening threat models makes my skin crawl.
I don't think I can say this enough right? Some of you are under the weird impressions that systems are "too secure for the general public to be allowed access to" and it just constantly blows my fucking mind.
Based on some discussions yesterday, I wrote up a more detailed note on the Apple on-device scanning saga with a focus on the "obfuscation" of the exact number of matches and dived into how one might (probabilistically) break it.
This isn't the biggest problem with the proposed system. It does however suggest that even if you *really* trust Apple to not abuse their power (or be abused by power) then Apple still needs to release details about system parameters and assumptions.
We can quibble about the exact numbers I used, and the likelihood of the existence of a "prolific parent account" taking 50 photos a day for an entire year but there are *real* bounds on the kinds of users any static threshold/synthetic parameters can sustain.