A: Biases in AI models will be represented in their outputs, which become *training data* for future models! (if we’re not careful).
These feedback cycles have the potential to get nasty.
2/6
A concrete example -
Generating from a language model with beam search is known to be repetitive/disfluent.
Under feedback (where a model is re-trained on its outputs), this problem very quickly magnifies by 2-3x!
Nucleus sampling, OTOH, is surprisingly stable.
3/6
In fact, we find that the more a model behaves like a sampler, the more stable it is.
We can connect samplers to necessary conditions for stability, which imply bounded bias amplification *even in the limit of infinite time feedback*.
This result has wider consequences -
4/6
Take image classifiers. Classifiers pick the argmax vote (like beam search), so perhaps they behave poorly.
In fact, the opposite is true bc classifiers actually kind of behave like samplers, a finding by @PreetumNakkiran & @whybansal called “Distributional Generalization”.
More on: 1) when sampling-like behavior appears naturally, 2) what this means for bias amplification on the internet, and 3) how to induce stability in otherwise unstable systems.
6/6
• • •
Missing some Tweet in this thread? You can try to
force a refresh