Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Yves-A. de Montjoye

@yvesalexandre

May 23 • 21 tweets • 13 min read Twitter logo

Read on Twitter

@IEEESSP

🚨 Does client-side scanning really allows you to detect known illegal content without breaking encryption? In a new @IEEESSP paper, we show that secondary features such as targeted facial recognition could actually be hidden into it. A thread 🧵

There have been long standing concerns by law enforcement agencies 👮 that encryption, and more recently end-to-end encryption, is preventing them from accessing content sent by users*.

(* some strongly disagree with the notion that law enforcement agencies are "going dark")

@signalapp

Client-side scanning (CSS) aims to allow law enforcement agencies to detect when known illegal content ⛔️ is shared on end-to-end encrypted (E2EE) messaging platforms 📱↔️📱such as @signalapp and @WhatsApp "without breaking encryption".

But how would CSS work? In short, a piece of software would be installed on people's phones 📱 as part of the messaging system. Every time an image is sent or received, the software would check whether the image is not content known to be illegal ⛔️.

Basically, what the software does is to checks that the image is not part of a database 🛢️ of content known to be illegal🚦.

If it is, the image is then shared (decrypted) with a third-party for further action and, ultimately, with law enforcement authorities 👮.

However, reliably detecting that an image is in the database 🛢️ is not easy. You indeed want to make sure it'll be flagged even if it's in a different format (png vs jpeg e.g.), if the image has been cropped, is in black and white, etc.

@Apple

To do this, CSS systems rely on deep perceptual hashing algorithms (DPH). These are #AI algorithms that can detect whether an image is a “near-duplicate” of a known (illegal) image.

@Apple's system, for example, uses an algorithm called NeuralHash.

@Apple

In this paper, we argue that while AI algorithms are very good at this (@Apple's said that NeuralHash's error rate is one in a million) they are also completely black-box ⬛️.

In the context of CSS, this is particularly problematic 😬.

@WhatsApp

Indeed, CSS systems would have access to every single image 🖼️ sent on E2EE messaging systems and can decide to share them, unencrypted, with law enforcement 🖼️➡️⬛️➡️👮.

For @WhatsApp alone, this represents billions of images daily.

In the paper, we study how a hidden "secondary" feature ("backdoor") could be built into a deep perceptual hashing algorithm.

In particular, we show DPH models can be trained to identify targeted individual 🧍‍♂️ in images 🖼️ and report them to law enforcement 👮.

And this works... very well.

Our dual-purpose model indeed identifies a targeted individual 🧍‍♂️ 67% of the time!

For comparison, the "normal" single-purpose DPH model would only identify the targeted individual 1% of the time.

But, wouldn't this get detected somehow?

Not really, we indeed show in the paper that this new feature is hidden. The algorithm performs its primary task, detecting known illegal content, well and only flags images containing a targeted individual.

Yes, but the database 🛢️will be tightly controlled so you can't just add pictures of the targeted individuals?

True. It could be managed by a charity such as NCMEC and possibly be verifiable (see e.g. eprint.iacr.org/2023/029)

@kerstingAIML

However, @kerstingAIML's team recently showed how DPH are vulnerable to collision attacks facctconference.org/static/pdfs_20…

Using their code, we managed to hide the information necessary to identify a target individual (facial rec template) into an illegal image (center, picture of a dog)

To conclude: DPH the algorithms at the core of CSS are black-box #AI models ⬛️.

What our results show is that it's easy to actually get them to look absolutely normal yet to include secondary hidden features ("backdoor"), here targeted facial recognition 🧍‍♂️.

While this is a concern in general with #AI models, the issue is particularly salient in the context of CSS where these algorithms can flag and share private content with authorities before it gets encrypted 🔐.

Importantly, these new results adds to a long list of concerns about CSS including questions about their effectiveness (avoidance detection attacks e.g. usenix.org/conference/use… and eprint.iacr.org/2021/1531) or how easy it is to create false positive (e.g. arxiv.org/abs/2111.06628)

@NCSC

For more info on the topic, Bugs in our pockets is a great summary of the concerns raised by CSS systems arxiv.org/abs/2110.07450 while arxiv.org/abs/2207.09506 is a thoughtful piece from @NCSC and @GCHQ on CSS from the law enforcement 👮 perspective.

@Apple

Last but not least, CSS are not just theoretical proposals. @Apple was about to deploy it at scale two years ago theregister.com/2021/12/16/app… and CSS are a key part of legislative proposals incl. the 🇬🇧 #OnlineSafety Bill and the 🇪🇺 CSAR (see e.g. techcrunch.com/2022/05/11/eu-… by @riptari)

@shubhamjain0594

This is work by my amazing students 🧑‍🎓 @shubhamjain0594 and @AnaMariaCretu5 with help from @CULLYAntoine at @ICComputing.

@shubhamjain0594 presented the work yesterday at @IEEESSP in San Francisco and the paper is available at imperialcollegelondon.app.box.com/s/xv46sdud6kvh…

@carmelatroncoso

cc/ @carmelatroncoso, @ThomasClaburn, @AlecMuffett, @rossjanderson, @matthew_d_green, @jonathanmayer, @SteveBellovin, @schneierblog, @mer__edith, @wcathcart.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @yvesalexandre

Yves-A. de Montjoye

@yvesalexandre

Jan 25, 2022

@NatureComms

🚨 New profiling attack in @NatureComms: we show, using graph neural networks, how interaction data such as messages or bluetooth close proximity metadata can be used to uniquely identify individuals over long periods of time. nature.com/articles/s4146… A thread 🧵

Re-identification attacks so far have mostly focused on matching attacks, meaning that the adversary 😈 has access to a subset of the data that is in the "anonymous" dataset. These auxiliary information range from gender+zip code+DOB to spatio-temporal points.

While matching attacks are very powerful 💪, they require access to identified auxiliary information that roughly match the information available in the "anonymous" dataset. This is a strong limitation when it comes to time-dependent data, a large part of the data collected today

Read 14 tweets

Yves-A. de Montjoye

@yvesalexandre

Jan 10, 2022

@NatureComms

🚨 We analyzed in @NatureComms the guarantees of an anonymous “Differentially Private” mobility dataset of 300M @googlemaps users shared with researchers. We believe these guarantees to be based on assumptions that are not met in practice. A thread 🧵nature.com/articles/s4146…

@googlemaps

In 2019, @googlemaps shared with researchers a dataset consisting of “trip flow information from over 300 million people world-wide” for “Google users who opted-in to Location History” 🗺️. The data was collected in 2016, and aggregated weekly in regions of roughly 1.27 km².

@googlemaps

@googlemaps The aggregated data is then described as being (eps, δ)-DP with “eps = 0.66 and δ = 2.1 × 10−29, which is very strong” and to not raise concerns as it would “at best improve the level of certainty [of an attacker] over a random guess by approximately 16%”.

Read 12 tweets

Yves-A. de Montjoye

@yvesalexandre

Aug 5, 2021

https://twitter.com/matthew_d_green/status/1423071186616000513

I wasn’t planning to tweet about this paper as it is under review but it’s important to get the word out there now: we evaluated 5 perceptual hashing algorithms and found all of them to be vulnerable to a simple black-box adversarial attack arxiv.org/abs/2106.09820 A thread ⤵️

https://twitter.com/matthew_d_green/status/1423071186616000513

Perceptual hashing-based client-side scanning solutions have been proposed by policy makers and some academics as a “privacy-preserving” solution to detect CSAM and other illegal content even when E2EE is used.

Technically, your phone would generate p-hashes of images which would be compared against a database of illegal content. Different solutions have been proposed from on-device solutions📱to p-hashes being sent to the authorities 👮e.g. politico.eu/wp-content/upl…

Read 9 tweets

Yves-A. de Montjoye

@yvesalexandre

Jul 22, 2021

@Grindr

The risk of individuals being re-identified in "anonymous" datasets has long been dismissed as a theoretical academic concern, unlikely to happen in practice. Yesterday, a US priest was shown to be using @Grindr and visiting gay bars. A thread ⤵️

https://twitter.com/josephfcox/status/1417880861458960393

While technical details on how the re-identification occurred are still unclear, the publication reported "correlat[ing] a unique mobile device to Burrill" in an anonymous "app signal dataset" washingtonpost.com/religion/2021/…

@LatanyaSweeney

Following re-id work by @LatanyaSweeney, @random_walker and others, we showed back in 2013 how easily seemingly anonymous location data could be re-identified. 4 points where enough to uniquely identify someone 95% of the time out of 1.5M people 🧐nature.com/articles/srep0…

Read 6 tweets

Yves-A. de Montjoye

@yvesalexandre

May 23, 2021

@lilianedwards

"Anonymized" mobile phone data used to analyze vaccinated users' mobility behavior in the UK. A thread ⬇️ telegraph.co.uk/politics/2021/… cc/ @lilianedwards, @realhamed

https://twitter.com/BigBrotherWatch/status/1396394103244566530

There are a few interesting things we can learn from the document above: first, the data comes from a CDR dataset of 18M users in the UK and the analysis is likely to have been performed by CKDelta ckdelta.ie

@ThreeUK

The data is likely to come from @ThreeUK. Three is advertised on CKDelta's website and both companies are part of CK Hutchison Holdings. Here is another piece of work by CKDelta

https://twitter.com/CopperConsult/status/1245993685877231617

Read 9 tweets

Yves-A. de Montjoye

@yvesalexandre

Apr 2, 2020

#COVID19 contact tracing apps: we need to go beyond shallow reassurances that privacy is protected. Here are 8 questions we think you should ask. A thread. cpg.doc.ic.ac.uk/blog/evaluatin…

Question 1: How do you limit the personal data gathered by the authority? Large-scale collection of personal data can quickly lead to mass surveillance.

Question 2: How do you protect the anonymity of every user?
Users’ identities should be protected. Special measures should be put in place to limit the risk that users can be re-identified by the authority, other users, or external parties.

Read 10 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Yves-A. de Montjoye

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @yvesalexandre

Yves-A. de Montjoye

Yves-A. de Montjoye

Yves-A. de Montjoye

Yves-A. de Montjoye

Yves-A. de Montjoye

Yves-A. de Montjoye

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!