Profile picture
Andrew Brock @ajmooch
, 11 tweets, 5 min read Read on Twitter
Large-Scale GAN Training: My internship project with Jeff and Karen.

We push the SOTA Inception Score from 52 -> 166+ and give GANs the ability to trade sample variety and fidelity.

arxiv.org/abs/1809.11096
The Truncation Trick we introduce allows for direct control of sample quality (at the cost of overall variety) by pushing samples towards the mode:
This typically results in objects coming towards the center of the frame and moving towards more "typical" backgrounds, whereas the untruncated samples might have features which are less frequently observed.
A major focus of this work is on scaling; our largest models each require 512 TPUv3s (thank you @JeffDean and the TPU teams for making this possible) and train with a batch size of 2048. We find that there's a lot of free lunch just from training, well, bigger!
Something I really wanted to do with this work (which I think should be standard) was to highlight the things we tried which *didn't* work. I believe that (machine) learning from mistakes is really important, and that negative results should be celebrated alongside the positive.
Some takeaways:
-Having gradients which capture a sufficient number of the modes you care about is critical--this may mean you need to accumulate over minibatches if you're doing multimodal stuff with low RAM
-There's lots to be gained by manipulating the latent space
-Global coherence is the primary challenge at high resolution--a model may understand that a spider has "a number" of legs, and that number is between "many" and "lots" but nothing in the networks' inductive biases really forces it to learn "eight"
On the generative modeling side, we see lots of really interesting little tidbits:
-Whether an image is grayscale or color is encoded in the joint (z, c) pairs, and not factored into the latents! (img 1)
-Pose, however, is very frequently factored into the latent (img 2)
(the images above are class-wise interpolations, where z is held constant and the class vector c is varied)
Early in training, especially at high res, we see these really fantastic class leakage examples (#dogball being my personal favorite), but by the time training converges/collapses the net mostly learns not to produce Catflowers and Hendogs
Corrupting the generator by zeroing out the attention layer leads to some interesting artifacts; I call these "End-of-Daysies"
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Andrew Brock
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!