Tweet

Sarah Jamie Lewis

18 Aug, 18 tweets, 7 min read

Both these images have NeuralHash: 1e986d5d29ed011a579bfdea

Just a reminder that visually similar images are not necessarily semantically similar images.

Love playing games like "Are these, technically, semantically similar images"?

All these images have NeuralHash: ba9f4edd1233a856784b2dc4

Hashes generated using the instructions / script found here: github.com/AsuharietYgvar…

https://twitter.com/SarahJamieLewis/status/1428004811849404428

As I said, finding specific (funny) collisions is trivial for perceptual hashes.

To be fair to Apple NeuralHash does seem at least somewhat resistant to -random- collisions (over the small set of tens of thousands of images I've thrown at it today...).

https://twitter.com/SarahJamieLewis/status/1428004811849404428

I let NeuralHash run most of the day in the background processing ~200K images. Didn't find any absolute "random" hashes (a few were 2-3 bits close) - but did find a few sets like this - sets of burst photos which match

All have the same hash: 75bbd25662074bdc7ac97677

NeuralHash seems to fail pretty easily on photos with small difference across a mostly static background - burst photos (and comic strips) being a common way of achieving this.

Interesting because all these would count as different photos in the context of a system.

The Apple system dedupes photos, but burst shots are semantically *different* photos with the same subject - and an unlucky match on a burst shot could lead to multiple match events on the back end if the system isn't implemented to defend against that.

I also wonder about the "everyone takes the same instagram photos" meme and how that winds it's way into all of this.

https://twitter.com/anishathalye/status/1428164089231069187

During experimentation today I ended up with a couple of very near matches so I've been playing with this great little tool (

https://twitter.com/anishathalye/status/1428164089231069187

) to see what it takes to mash them together.

Here is 11f6794bacf037d93aced8e0

Fun to note that because the originals were already near each (15fc79cfaef00d997ac65ce0 vs 15fc79cfaef01df9aecf7ee0) other it only took a few seconds each to modify them both to collide.

Here are another two: a852759b6dc0e748f04bf567

(Started off as: a852759b6dc0e708fbcbf767 vs aa52759b6de0cf8db01bb545 )

I can do faces too: 152d7772a8d47e156ef90a22

Those two started off as 152d77722cd02e3f66fb8a22 and 152e6e1508d0fe11ae59c9f0 and took a little more massaging to lower the obvious artefact - I think it could probably be better.

These are clearly the same image :) 72cb88a3e718d8c3c22cd118

Could be a dataset limitation, but after amassing a few hundred thousand samples I suspect at this point that not all 12 byte sequences are viable NeuralHashes.

@braddwyer

Some naturally occurring collisions found by @braddwyer in ImageNet which have the same features as above (static background, smaller subject).

blog.roboflow.com/nerualhash-col…

I have things to do today but later this evening I will write this up in more depth because there is a lot of misconceptions flying around about what is and isn't interesting/relevant.

Until then check out my last post on the whole Apple PSI system: pseudorandom.resistant.tech/ftpsi-paramete…

https://twitter.com/SarahJamieLewis/status/1428496216615047173

Write up is here:

https://twitter.com/SarahJamieLewis/status/1428496216615047173

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SarahJamieLewis

Sarah Jamie Lewis

@SarahJamieLewis

14 Sep

@gabrie_beck

This is cool, earlier this year I looked into the privacy of FMD (by @gabrie_beck et al) including simulations of attacks on realistic datasets.

Now, @Istvan_A_Seres et al have performed their own analysis and, in addition, have shown attack improvements on those same datasets.

https://twitter.com/Istvan_A_Seres/status/1437847195567411201

You can find my original dive into those datasets as part of the book I put together for fuzzytags (a rust implementation of FMD)

docs.openprivacy.ca/fuzzytags-book…

The attack improvements come from considering temporal relationships (the probability of receiving messages over a given threshold in a period of time) instead of just over the lifetime of the system.

This can be devastating if false positive rates are poorly selected.

Read 12 tweets

Sarah Jamie Lewis

@SarahJamieLewis

16 Aug

Revisiting first impressions of the Apple PSI system in light of the new threat model.

pseudorandom.resistant.tech/ftpsi-paramete…

I think the main takeaway is that there hasn't been enough push back and that this now seems depressingly inevitable.

I expect we will see more calls for surveillance like this in the coming months heavily remixed into the ongoing "online harms" narrative.

Without a strong stance from other tech companies, in particular device manufacturers and OS developers, we will look back on the last few weeks as the beginning of the end of generally available consumer devices that don't conduct constant algorithmic surveillance.

Read 5 tweets

Sarah Jamie Lewis

@SarahJamieLewis

13 Aug

https://twitter.com/SarahJamieLewis/status/1425211436804968448

Apple have given some interviews today where they explicitly state that the threshold t=30.

Which means the false acceptance rate is likely an order of magnitude *more* that I calculated in this article.

https://twitter.com/SarahJamieLewis/status/1425211436804968448

Someone asked me on a reddit thread the other day what value t would have to be if NeuralHash had a similar false acceptance rate to other perceptual hashes and I ball parked it at between 20-60...so yeah.

Some quick calculations with the new numbers:

3-4 photos/day: 1 match every 286 days.
50 photos/day: 1 match every 20 days.

Read 17 tweets

Sarah Jamie Lewis

@SarahJamieLewis

12 Aug

As an appendix/follow up to my previous article (a probabilistic analysis of the high level operation of a system like the one that Apple has proposed) here are some thoughts / notes / analysis of the actual protocol.

pseudorandom.resistant.tech/a_closer_look_…

https://twitter.com/SarahJamieLewis/status/1425211436804968448

The previous article can be found here:

https://twitter.com/SarahJamieLewis/status/1425211436804968448

Honestly I think the weirdest thing given the intent of this system is how susceptible this protocol seems to be to malicious clients who can easily make the server do extra work, and can probably also just legitimately DoS the human-check with enough contrived matches.

Read 11 tweets

Sarah Jamie Lewis

@SarahJamieLewis

12 Aug

Daily Affirmation: End to end encryption provides some safety, but it doesn't go far enough.

For decades our tools have failed to combat bulk metadata surveillance, it's time to push forward and support radical privacy initiatives.

Watching actual cryptographers debate about whether or not we should be voluntarily *weakening* encryption instead of radically strengthening threat models makes my skin crawl.

I don't think I can say this enough right? Some of you are under the weird impressions that systems are "too secure for the general public to be allowed access to" and it just constantly blows my fucking mind.

Read 5 tweets

Sarah Jamie Lewis

@SarahJamieLewis

10 Aug

Based on some discussions yesterday, I wrote up a more detailed note on the Apple on-device scanning saga with a focus on the "obfuscation" of the exact number of matches and dived into how one might (probabilistically) break it.

Comments welcome.

pseudorandom.resistant.tech/obfuscated_app…

https://twitter.com/SarahJamieLewis/status/1424278559112122370

This isn't the biggest problem with the proposed system. It does however suggest that even if you *really* trust Apple to not abuse their power (or be abused by power) then Apple still needs to release details about system parameters and assumptions.

https://twitter.com/SarahJamieLewis/status/1424278559112122370

We can quibble about the exact numbers I used, and the likelihood of the existence of a "prolific parent account" taking 50 photos a day for an entire year but there are *real* bounds on the kinds of users any static threshold/synthetic parameters can sustain.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Sarah Jamie Lewis

Try unrolling a thread yourself!

More from @SarahJamieLewis

Sarah Jamie Lewis

Sarah Jamie Lewis

Sarah Jamie Lewis

Sarah Jamie Lewis

Sarah Jamie Lewis

Sarah Jamie Lewis

Did Thread Reader help you today?

Like this author's thread?