Latest Twitter Threads by @ajmooch on Thread Reader App

Feb 12, 2021 • 7 tweets • 3 min read

Our most recent work on training Normalizer-Free nets!

We focus on developing performant architectures which train fast, and show that a simple technique (Adaptive Grad Clipping, or AGC) allows us to train with large batches and heavy augmentations and reach state-of-the-art.

https://twitter.com/DeepMind/status/1360217173797568514

AGC is based on the idea that the magnitude of a gradient shouldn't be too large relative to the magnitude of a parameter. It's strongly related to normalized optimizers like LARS or LAMB, but operates per-output-unit and is relaxed in that it doesn't ignore gradient magnitudes.

Oct 1, 2018 • 11 tweets • 5 min read

Large-Scale GAN Training: My internship project with Jeff and Karen.

We push the SOTA Inception Score from 52 -> 166+ and give GANs the ability to trade sample variety and fidelity.

arxiv.org/abs/1809.11096

The Truncation Trick we introduce allows for direct control of sample quality (at the cost of overall variety) by pushing samples towards the mode:

Share this page!

Enter URL or ID to Unroll