Latest Twitter Threads by @aleks_madry on Thread Reader App

Jul 20, 2023 • 5 tweets • 2 min read

Why does my model think that hats are cats?

Our latest work presents a new perspective on backdoor attacks: backdoors and features are *indistinguishable*, and for a good reason.

with @Alaa_Khaddaj @gpoleclerc @AMakelov @kris_georgiev1 @hadisalmanX @andrew_ilyas [1/5]

Indeed, imagine choosing 5% of the cat images in ImageNet training set, and superimposing synthetically generated hats on top of them.

The hat feature (which now is associated with cats) is a valid and effective backdoor trigger! (And you can find “natural triggers” too.) [2/5]

Mar 27, 2023 • 6 tweets • 5 min read

As ML models/datasets get bigger + more opaque, we need a *scalable* way to ask: where in the *data* did a prediction come from?

Presenting TRAK: data attribution with (significantly) better speed/efficacy tradeoffs:

w/ @smsampark @kris_georgiev1 @andrew_ilyas @gpoleclerc 1/6

Turns out: Existing data attribution methods don't scale---they're either too expensive or too inaccurate. But TRAK can handle ImageNet classifiers, CLIP, and LLMs alike. (2/6)

Paper: arxiv.org/abs/2303.14186
Blog: gradientscience.org/trak
Website: trak.csail.mit.edu

Nov 3, 2022 • 9 tweets • 7 min read

Last week on @TheDailyShow, @Trevornoah asked @OpenAI @miramurati a (v. important) Q: how can we safeguard against AI-powered photo editing for misinformation?

My @MIT students hacked a way to "immunize" photos against edits: gradientscience.org/photoguard/ (1/8)

Remember when Trevor shared (on Instagram) a photo with @michaelkosta at a tennis game? (2/8)

Feb 2, 2022 • 6 tweets • 4 min read

Can we cast ML predictions as simple functions of individual training inputs? Yes! w/ @andrew_ilyas @smsampark @logan_engstrom @gpoleclerc, we introduce datamodels (arxiv.org/abs/2202.00622), a framework to study how data + algs -> predictions. Blog: gradientscience.org/datamodels-1/ (1/6)

We trained *hundreds of thousands* of models on random subsets of computer vision datasets using our library FFCV (ffcv.io). We then used this data to fit *linear* models that can successfully predict model outputs. (2/6)

Jan 18, 2022 • 4 tweets • 3 min read

ImageNet is the new CIFAR! My students made FFCV (ffcv.io), a drop-in data loading library for training models *fast* (e.g., ImageNet in half an hour on 1 GPU, CIFAR in half a minute).
FFCV speeds up ~any existing training code (no training tricks needed) (1/3)

FFCV is easy to use, minimally invasive, fast, and flexible: github.com/MadryLab/ffcv#…. We're really excited to both release FFCV today, and start unveiling (soon!) some of the large-scale empirical work it has enabled us to perform on an academic budget. (2/3)

Share this page!

Enter URL or ID to Unroll