Aleksander Madry Profile picture
MIT faculty (on leave) and a researcher at OpenAI. Working on making ML better understood and more reliable. Thinking about the impact of ML on society too.
Jerome Ku Profile picture Nikolai Profile picture 2 subscribed
Jul 20, 2023 5 tweets 2 min read
Why does my model think that hats are cats?

Our latest work presents a new perspective on backdoor attacks: backdoors and features are *indistinguishable*, and for a good reason.

with @Alaa_Khaddaj @gpoleclerc @AMakelov @kris_georgiev1 @hadisalmanX @andrew_ilyas [1/5] Image Indeed, imagine choosing 5% of the cat images in ImageNet training set, and superimposing synthetically generated hats on top of them.

The hat feature (which now is associated with cats) is a valid and effective backdoor trigger! (And you can find “natural triggers” too.) [2/5] Image
Mar 27, 2023 6 tweets 5 min read
As ML models/datasets get bigger + more opaque, we need a *scalable* way to ask: where in the *data* did a prediction come from?

Presenting TRAK: data attribution with (significantly) better speed/efficacy tradeoffs:

w/ @smsampark @kris_georgiev1 @andrew_ilyas @gpoleclerc 1/6 Turns out: Existing data attribution methods don't scale---they're either too expensive or too inaccurate. But TRAK can handle ImageNet classifiers, CLIP, and LLMs alike. (2/6)

Paper: arxiv.org/abs/2303.14186
Blog: gradientscience.org/trak
Website: trak.csail.mit.edu
Nov 3, 2022 9 tweets 7 min read
Last week on @TheDailyShow, @Trevornoah asked @OpenAI @miramurati a (v. important) Q: how can we safeguard against AI-powered photo editing for misinformation?

My @MIT students hacked a way to "immunize" photos against edits: gradientscience.org/photoguard/ (1/8) An overview of our "immunization" methodology. Remember when Trevor shared (on Instagram) a photo with @michaelkosta at a tennis game? (2/8) A photo of Trevor Noah and Michael Kosta at a tennis game.
Feb 2, 2022 6 tweets 4 min read
Can we cast ML predictions as simple functions of individual training inputs? Yes! w/ @andrew_ilyas @smsampark @logan_engstrom @gpoleclerc, we introduce datamodels (arxiv.org/abs/2202.00622), a framework to study how data + algs -> predictions. Blog: gradientscience.org/datamodels-1/ (1/6) Image We trained *hundreds of thousands* of models on random subsets of computer vision datasets using our library FFCV (ffcv.io). We then used this data to fit *linear* models that can successfully predict model outputs. (2/6) ImageImage
Jan 18, 2022 4 tweets 3 min read
ImageNet is the new CIFAR! My students made FFCV (ffcv.io), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute).
FFCV speeds up ~any existing training code (no training tricks needed) (1/3) FFCV is easy to use, minimally invasive, fast, and flexible: github.com/MadryLab/ffcv#…. We're really excited to both release FFCV today, and start unveiling (soon!) some of the large-scale empirical work it has enabled us to perform on an academic budget. (2/3)