Read on Twitter

12,399 views

Smerity

@Smerity

, 13 tweets, 5 min read Read on Twitter

@SFResearch

@SFResearch

I'm incredibly proud that the low compute / low resource AWD-LSTM and QRNN that I helped develop at @SFResearch live on as first class architectures in the @fastdotai community :)

https://twitter.com/PiotrCzapla/status/1168120760201859072

I think the community has become blind in the BERT / Attention Is All You Need era. If you think a singular architecture is the best, for whatever metric you're focused on, remind yourself of the recent history of model architecture evolution.

Whilst pretrained weights can be an advantage it also ties you to someone else's whims. Did they train on a dataset that fits your task? Was your task ever intended? Did their setup have idiosyncrasies that might bite you? Will you hit a finetuning progress dead end?

https://twitter.com/seb_ruder/status/1167086905189642240

https://twitter.com/seb_ruder/status/1167086905189642240

You might have success transferring from language to language but I can fairly soundly say (thanks to the many languages the AWD-LSTM has been used on) that there's a good expectation of compute to result for many low resource models.

https://twitter.com/seb_ruder/status/1167086905189642240

If you want to train on a "different language" (scientific readings, audio embeddings, ...) then pretrained models likely won't get you far at all. And now we'd realize the focus of our field has been on large compute / resource models that you have no hope of reproducing.

This isn't how it needs to be. Pretrained models from titan sized resource expenditures offer fascinating glimpses of the future but I don't think they're our only path or even most likely path. They're also not likely by themselves the next source of innovation.

I don't want to wake up a half decade from now and realize the many fine brains of our field have only been dedicated to finetuning models that are near impossible to reproduce that we've inevitably concentrated the core progress of AI in only a few silos / domains.

@NVIDIAAIDev

@NVIDIAAIDev

Related / unrelated example: I've a new model I'm exploring with preliminary results of 1.173 bpc (test) on enwik8 after ten epochs / 7.74 hours of training on a single GPU (a gifted Titan V from @NVIDIAAIDev / @NvidiaAI). I have only used that card plus occasionally a 1080.

Whilst I'd love more compute as it'd speed up my process (and turn my studio apartment from uncomfortably warm to a fireplace 😅) I would still want a single GPU to quickly approximation my larger model. It almost always helps the end result, big or small.

When our field hits the stage where compute is our pure and only limit I promise to appropriately tweet "the sky is falling" and bemoan the state of us all - but until then it seems more a lack of innovation / focus on smaller models is what's hampering progress.

@Intel

@Intel

The recent @Intel hosted "Silicon 100" event (h/t @riva) celebrating the 71st anniversary of the first transistor reminded me that more than anything Moore's Law was as much a social contract on the innovators in the field as it was technological.

https://twitter.com/rivatez/status/1145823684600320003

If you expect big models to keep improving then the attention is going to remain on big models.
If you expect small models to keep improving then the attention is going to remain on small models.
Attention dictates future progress.
So, really, the right attention is all we need.

If you enjoy my ranting, you may enjoy my ranted article on this topic.
"Adding ever more engines may help get the plane off the ground...
but that's not the design planes are destined for."
smerity.com/articles/2018/…

Like this thread? Get email updates or save it to PDF!

Subscribe to Smerity

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Smerity

This content may be removed anytime!

Try unrolling a thread yourself!

More from @Smerity see all

Related threads

Trending hashtags

Did Thread Reader help you today?