Andrew Brock Profile picture
Dimensionality Diabolist, Affine combination of Mad Scientist and Village Idiot
Feb 12, 2021 7 tweets 3 min read
Our most recent work on training Normalizer-Free nets!

We focus on developing performant architectures which train fast, and show that a simple technique (Adaptive Grad Clipping, or AGC) allows us to train with large batches and heavy augmentations and reach state-of-the-art. AGC is based on the idea that the magnitude of a gradient shouldn't be too large relative to the magnitude of a parameter. It's strongly related to normalized optimizers like LARS or LAMB, but operates per-output-unit and is relaxed in that it doesn't ignore gradient magnitudes.
Oct 1, 2018 11 tweets 5 min read
Large-Scale GAN Training: My internship project with Jeff and Karen.

We push the SOTA Inception Score from 52 -> 166+ and give GANs the ability to trade sample variety and fidelity.

arxiv.org/abs/1809.11096 The Truncation Trick we introduce allows for direct control of sample quality (at the cost of overall variety) by pushing samples towards the mode: