Rohan Taori Profile picture
phd student @StanfordAILab🌲| proud @Cal alum 🐻 | prev taught w @BerkeleyML
Sep 15, 2022 6 tweets 3 min read
🎉 The last few weeks have seen the release of #StableDiffusion, #OPT, and other large models.

⚠️ But should we be concerned about an irreversible influx of AI content on the internet?

⚙️ Will this make it harder to collect clean training data for future AI models?

🧵👇 1/6 (thread based on recent work arxiv.org/pdf/2209.03942…)

Q: So what’s the root issue?

A: Biases in AI models will be represented in their outputs, which become *training data* for future models! (if we’re not careful).

These feedback cycles have the potential to get nasty.

2/6
Dec 8, 2020 8 tweets 4 min read
Reliability is a key challenge in ML. There are now dozens of robust training methods and datasets - how do they compare?

We ran 200+ ImageNet models on 200+ test sets to find out.
modestyachts.github.io/imagenet-testb…

TDLR: Distribution shift is *really* hard, but common patterns emerge. To organize the 200 distribution shifts, we divide them into two categories: synthetic shifts and natural shifts.

Synthetic shifts are derived from existing images by perturbing them with noise, etc.

Natural shifts are new, unperturbed images from a different distribution.