Aleksander Madry Profile picture
Nov 3, 2022 9 tweets 7 min read Read on X
Last week on @TheDailyShow, @Trevornoah asked @OpenAI @miramurati a (v. important) Q: how can we safeguard against AI-powered photo editing for misinformation?

My @MIT students hacked a way to "immunize" photos against edits: gradientscience.org/photoguard/ (1/8) An overview of our "immunization" methodology.
Remember when Trevor shared (on Instagram) a photo with @michaelkosta at a tennis game? (2/8) A photo of Trevor Noah and Michael Kosta at a tennis game.
Using cutting-edge image generation models like #dalle2 and #stablediffusion, someone can easily manipulate the above photo to get this (fake) one: (3/8) (Fake) photo of Trevor Noah and Michael Kosta ballroom danci
Could Trevor have done anything to prevent this? My students @hadisalmanX @Alaa_Khaddaj @gpoleclerc @andrew_ilyas spent an enjoyable weekend hacking together a potential answer: adding small (imperceptible) noise to the original photo can make it “immune” to such edits! (4/8) The original photo and its “immunized” version.
After such “immunization”, the same edit of this photo looks much worse.
So, Trevor could have applied such “immunization” to his photo before posting it to protect it against this kind of malicious edits. (5/8) An (unrealistic) edit of the photo of Trevor Noah and Michae
And it is not only about Trevor’s and Michael’s photo. In fact, the lead student on this project @hadisalmanX has a selfie with Trevor too. Now, Hadi is attempting to “deepen” his (imaginary) friendship with @Trevornoah by manipulating this selfie (and he succeeds!) (6/8) A selfie of Hadi Salman and Trevor Noah.(Fake) edited photos of Hadi Salman and Trevor Noah.
However, again, had this selfie been “immunized”, this would not have been possible! Indeed, images generated from an immunized version of Hadi’s photo with Trevor are totally unrealistic. (7/8) An (unrealistic) edits of “immunized” photos of Trevor NAnother (unrealistic) edits of “immunized” photos of Tre
This works for other edits too (although, for now, might be specific to the photo-editing engine we had on our hands)! Check out our blog post gradientscience.org/photoguard/ for more examples and more details. And stay tuned for the paper! (8/8)
Also, here is the code if you want to play with it: github.com/MadryLab/photo… (9/8)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Aleksander Madry

Aleksander Madry Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @aleks_madry

Jul 20, 2023
Why does my model think that hats are cats?

Our latest work presents a new perspective on backdoor attacks: backdoors and features are *indistinguishable*, and for a good reason.

with @Alaa_Khaddaj @gpoleclerc @AMakelov @kris_georgiev1 @hadisalmanX @andrew_ilyas [1/5] Image
Indeed, imagine choosing 5% of the cat images in ImageNet training set, and superimposing synthetically generated hats on top of them.

The hat feature (which now is associated with cats) is a valid and effective backdoor trigger! (And you can find “natural triggers” too.) [2/5] Image
Now, since backdoors are fundamentally *indistinguishable* from other features in the data, we need to make some assumptions.

What is the right assumption to make though?

In our work, we assume that backdoors correspond to the “strongest” feature in the data [3/5]
Read 5 tweets
Mar 27, 2023
As ML models/datasets get bigger + more opaque, we need a *scalable* way to ask: where in the *data* did a prediction come from?

Presenting TRAK: data attribution with (significantly) better speed/efficacy tradeoffs:

w/ @smsampark @kris_georgiev1 @andrew_ilyas @gpoleclerc 1/6
Turns out: Existing data attribution methods don't scale---they're either too expensive or too inaccurate. But TRAK can handle ImageNet classifiers, CLIP, and LLMs alike. (2/6)

Paper: arxiv.org/abs/2303.14186
Blog: gradientscience.org/trak
Website: trak.csail.mit.edu
What can you do with TRAK? One example: *fact tracing*---identifying data sources that caused a model to generate a fact (arxiv.org/abs/2205.11482). Surprisingly, models influenced *more* by data sources found with TRAK than *ground-truth* data sources containing that fact: (3/6)
Read 6 tweets
Feb 2, 2022
Can we cast ML predictions as simple functions of individual training inputs? Yes! w/ @andrew_ilyas @smsampark @logan_engstrom @gpoleclerc, we introduce datamodels (arxiv.org/abs/2202.00622), a framework to study how data + algs -> predictions. Blog: gradientscience.org/datamodels-1/ (1/6) Image
We trained *hundreds of thousands* of models on random subsets of computer vision datasets using our library FFCV (ffcv.io). We then used this data to fit *linear* models that can successfully predict model outputs. (2/6) ImageImage
We then use datamodels to: (1) Predict data counterfactuals (i.e., what if I remove subset R from the train set?) and find that you can flip model predictions for *over 50%* of test examples on CIFAR-10 by removing only 200 (target-specific) training images (0.4% of total) (3/6) Image
Read 6 tweets
Jan 18, 2022
ImageNet is the new CIFAR! My students made FFCV (ffcv.io), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute).
FFCV speeds up ~any existing training code (no training tricks needed) (1/3)
FFCV is easy to use, minimally invasive, fast, and flexible: github.com/MadryLab/ffcv#…. We're really excited to both release FFCV today, and start unveiling (soon!) some of the large-scale empirical work it has enabled us to perform on an academic budget. (2/3)
You can start using FFCV today: check out the repo (github.com/MadryLab/ffcv) and docs (docs.ffcv.io)---we even have a Slack! Stay tuned for a blog post, and a paper explaining the details. w/ @gpoleclerc @andrew_ilyas @logan_engstrom @smsampark @hadisalmanx (3/3)
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(