Profile picture
, 13 tweets, 5 min read Read on Twitter
I'm incredibly proud that the low compute / low resource AWD-LSTM and QRNN that I helped develop at @SFResearch live on as first class architectures in the @fastdotai community :)
I think the community has become blind in the BERT / Attention Is All You Need era. If you think a singular architecture is the best, for whatever metric you're focused on, remind yourself of the recent history of model architecture evolution.
Whilst pretrained weights can be an advantage it also ties you to someone else's whims. Did they train on a dataset that fits your task? Was your task ever intended? Did their setup have idiosyncrasies that might bite you? Will you hit a finetuning progress dead end?
You might have success transferring from language to language but I can fairly soundly say (thanks to the many languages the AWD-LSTM has been used on) that there's a good expectation of compute to result for many low resource models.
If you want to train on a "different language" (scientific readings, audio embeddings, ...) then pretrained models likely won't get you far at all. And now we'd realize the focus of our field has been on large compute / resource models that you have no hope of reproducing.
This isn't how it needs to be. Pretrained models from titan sized resource expenditures offer fascinating glimpses of the future but I don't think they're our only path or even most likely path. They're also not likely by themselves the next source of innovation.
I don't want to wake up a half decade from now and realize the many fine brains of our field have only been dedicated to finetuning models that are near impossible to reproduce that we've inevitably concentrated the core progress of AI in only a few silos / domains.
Related / unrelated example: I've a new model I'm exploring with preliminary results of 1.173 bpc (test) on enwik8 after ten epochs / 7.74 hours of training on a single GPU (a gifted Titan V from @NVIDIAAIDev / @NvidiaAI). I have only used that card plus occasionally a 1080.
Whilst I'd love more compute as it'd speed up my process (and turn my studio apartment from uncomfortably warm to a fireplace 😅) I would still want a single GPU to quickly approximation my larger model. It almost always helps the end result, big or small.
When our field hits the stage where compute is our pure and only limit I promise to appropriately tweet "the sky is falling" and bemoan the state of us all - but until then it seems more a lack of innovation / focus on smaller models is what's hampering progress.
The recent @Intel hosted "Silicon 100" event (h/t @riva) celebrating the 71st anniversary of the first transistor reminded me that more than anything Moore's Law was as much a social contract on the innovators in the field as it was technological.
If you expect big models to keep improving then the attention is going to remain on big models.
If you expect small models to keep improving then the attention is going to remain on small models.
Attention dictates future progress.
So, really, the right attention is all we need.
If you enjoy my ranting, you may enjoy my ranted article on this topic.
"Adding ever more engines may help get the plane off the ground...
but that's not the design planes are destined for."
smerity.com/articles/2018/…
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Smerity
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!