Both these images have NeuralHash: 1e986d5d29ed011a579bfdea

Just a reminder that visually similar images are not necessarily semantically similar images.
Love playing games like "Are these, technically, semantically similar images"?

All these images have NeuralHash: ba9f4edd1233a856784b2dc4
Hashes generated using the instructions / script found here: github.com/AsuharietYgvar…
As I said, finding specific (funny) collisions is trivial for perceptual hashes.

To be fair to Apple NeuralHash does seem at least somewhat resistant to -random- collisions (over the small set of tens of thousands of images I've thrown at it today...).

I let NeuralHash run most of the day in the background processing ~200K images. Didn't find any absolute "random" hashes (a few were 2-3 bits close) - but did find a few sets like this - sets of burst photos which match

All have the same hash: 75bbd25662074bdc7ac97677
NeuralHash seems to fail pretty easily on photos with small difference across a mostly static background - burst photos (and comic strips) being a common way of achieving this.

Interesting because all these would count as different photos in the context of a system.
The Apple system dedupes photos, but burst shots are semantically *different* photos with the same subject - and an unlucky match on a burst shot could lead to multiple match events on the back end if the system isn't implemented to defend against that.
I also wonder about the "everyone takes the same instagram photos" meme and how that winds it's way into all of this.
During experimentation today I ended up with a couple of very near matches so I've been playing with this great little tool () to see what it takes to mash them together.

Here is 11f6794bacf037d93aced8e0
Fun to note that because the originals were already near each (15fc79cfaef00d997ac65ce0 vs 15fc79cfaef01df9aecf7ee0) other it only took a few seconds each to modify them both to collide.
Here are another two: a852759b6dc0e748f04bf567

(Started off as: a852759b6dc0e708fbcbf767 vs aa52759b6de0cf8db01bb545 )
I can do faces too: 152d7772a8d47e156ef90a22
Those two started off as 152d77722cd02e3f66fb8a22 and 152e6e1508d0fe11ae59c9f0 and took a little more massaging to lower the obvious artefact - I think it could probably be better.
These are clearly the same image :) 72cb88a3e718d8c3c22cd118
Could be a dataset limitation, but after amassing a few hundred thousand samples I suspect at this point that not all 12 byte sequences are viable NeuralHashes.
Some naturally occurring collisions found by @braddwyer in ImageNet which have the same features as above (static background, smaller subject).

blog.roboflow.com/nerualhash-col…
I have things to do today but later this evening I will write this up in more depth because there is a lot of misconceptions flying around about what is and isn't interesting/relevant.

Until then check out my last post on the whole Apple PSI system: pseudorandom.resistant.tech/ftpsi-paramete…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sarah Jamie Lewis

Sarah Jamie Lewis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SarahJamieLewis

14 Sep
This is cool, earlier this year I looked into the privacy of FMD (by @gabrie_beck et al) including simulations of attacks on realistic datasets.

Now, @Istvan_A_Seres et al have performed their own analysis and, in addition, have shown attack improvements on those same datasets.
You can find my original dive into those datasets as part of the book I put together for fuzzytags (a rust implementation of FMD)

docs.openprivacy.ca/fuzzytags-book…
The attack improvements come from considering temporal relationships (the probability of receiving messages over a given threshold in a period of time) instead of just over the lifetime of the system.

This can be devastating if false positive rates are poorly selected.
Read 12 tweets
16 Aug
Revisiting first impressions of the Apple PSI system in light of the new threat model.

pseudorandom.resistant.tech/ftpsi-paramete…
I think the main takeaway is that there hasn't been enough push back and that this now seems depressingly inevitable.

I expect we will see more calls for surveillance like this in the coming months heavily remixed into the ongoing "online harms" narrative.
Without a strong stance from other tech companies, in particular device manufacturers and OS developers, we will look back on the last few weeks as the beginning of the end of generally available consumer devices that don't conduct constant algorithmic surveillance.
Read 5 tweets
13 Aug
Apple have given some interviews today where they explicitly state that the threshold t=30.

Which means the false acceptance rate is likely an order of magnitude *more* that I calculated in this article.
Someone asked me on a reddit thread the other day what value t would have to be if NeuralHash had a similar false acceptance rate to other perceptual hashes and I ball parked it at between 20-60...so yeah.
Some quick calculations with the new numbers:

3-4 photos/day: 1 match every 286 days.
50 photos/day: 1 match every 20 days.
Read 17 tweets
12 Aug
As an appendix/follow up to my previous article (a probabilistic analysis of the high level operation of a system like the one that Apple has proposed) here are some thoughts / notes / analysis of the actual protocol.

pseudorandom.resistant.tech/a_closer_look_…
Honestly I think the weirdest thing given the intent of this system is how susceptible this protocol seems to be to malicious clients who can easily make the server do extra work, and can probably also just legitimately DoS the human-check with enough contrived matches.
Read 11 tweets
12 Aug
Daily Affirmation: End to end encryption provides some safety, but it doesn't go far enough.

For decades our tools have failed to combat bulk metadata surveillance, it's time to push forward and support radical privacy initiatives.
Watching actual cryptographers debate about whether or not we should be voluntarily *weakening* encryption instead of radically strengthening threat models makes my skin crawl.
I don't think I can say this enough right? Some of you are under the weird impressions that systems are "too secure for the general public to be allowed access to" and it just constantly blows my fucking mind.
Read 5 tweets
10 Aug
Based on some discussions yesterday, I wrote up a more detailed note on the Apple on-device scanning saga with a focus on the "obfuscation" of the exact number of matches and dived into how one might (probabilistically) break it.

Comments welcome.

pseudorandom.resistant.tech/obfuscated_app…
This isn't the biggest problem with the proposed system. It does however suggest that even if you *really* trust Apple to not abuse their power (or be abused by power) then Apple still needs to release details about system parameters and assumptions.

We can quibble about the exact numbers I used, and the likelihood of the existence of a "prolific parent account" taking 50 photos a day for an entire year but there are *real* bounds on the kinds of users any static threshold/synthetic parameters can sustain.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(