Profile picture
, 5 tweets, 1 min read Read on Twitter
Deep learning training tip that I realized I do but never learned from anyone - when tweaking your model for improving gradient flow / speed to converge, keep the exact same random seed (hyperparameters and weight initializations) and only modify the model interactions.
- Your model runs will have the exact same perplexity spikes (hits confusing data at the same time)
- You can compare timestamp / batch results in early training as a pseudo-estimate of convergence
- Improved gradient flow visibly helps the same init do better
Important to change out the random seed occasionally when you think you've isolated progress but minimizing noise during experimentation is OP. You're already dealing with millions of parameters and billions of calculations. You don't need any more confusion in the process.
Anyone else doing this? As noted I never learned it explicitly, just a habit I got into. I tend to think most BigCo will be running hyperparameter sweeps / randomization due to the environment but maybe others are doing this on their own?
This may or may not be a soft pseudo-science variant of the Lottery Ticket Hypothesis / @hardmaru et al's Weight Agnostic Neural Networks. Either way it has worked multiple times over multiple datasets for me and the results seem to generalize.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Smerity
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!