Roger Grosse Profile picture
Dec 31, 2023 5 tweets 1 min read
Two years ago, I taught a topics course on neural net training dynamics. While this isn't about safety/alignment per se, I recommend working through it if you're interested in safety/alignment of LLMs. Since fine-tuning of LLMs is highly overparameterized, alignment relies on the implicit bias of the optimizer, which comes from the knowledge and associations built in during pre-training. Roughly the first half contains conceptual tools for thinking about how this works.
Aug 16, 2020 10 tweets 3 min read
This 2019 paper on Fourier analysis of adversarial robustness, by Dong Yin et al., is really worth a look. It gives a simple, intuitive way of understanding a wide variety of adversarial and robustness phenomena.
papers.nips.cc/paper/9483-a-f… In the Fourier domain, images have a well-studied 1/|f|^a power spectrum, such that low frequencies have much higher power than high frequencies.

This describes the 2nd-order statistics of images, so Fourier analysis is the first step before looking at higher-order statistics.
Nov 24, 2018 7 tweets 2 min read
Important paper from Google on large batch optimization. They do impressively careful experiments measuring # iterations needed to achieve target validation error at various batch sizes. The main "surprise" is the lack of surprises. [thread]

arxiv.org/abs/1811.03600 The paper is a good example of lots of elements of good experimental design. They validate their metric by showing lots of variants give consistent results. They tune hyperparamters separately for each condition, check that optimum isn't at the endpoints, and measure sensitivity.