Mikhail Burtsev Profile picture
Apr 21, 2023 10 tweets 8 min read Read on X
🚀 1/ Excited to share our (with Aydar Bulatov and @yurakuratov ) report on scaling Recurrent Memory Transformer to 2M (yes, two millions)😮 tokens! 🧠🌐 #AI #NLP #DeepLearning Image
2/ 📈 We've tackled the quadratic complexity of attention in #Transformers by combining token-based memory & segment-level recurrence, using RMT.
🔸 RMT adapts to any Transformer family model
🔸 Memory tokens provide the recurrent connection 🎛️💡 #AI #NLP #DeepLearning Image
3/ 🧠 We tested RMT's memorization capabilities with synthetic datasets requiring fact memorization, detection, & reasoning. The model must separate facts from irrelevant text and use them to answer questions in a 6-class classification. 🎯 #AI #NLP #DeepLearning Image
4/ 📊 In our experiments, we used the pretrained BERT model as the backbone for RMT. We employed curriculum learning, starting with shorter tasks & increasing length upon convergence. This improved accuracy & stability in our model's performance. 💪 #AI #NLP #DeepLearning Image
5/ 📈 RMT's extrapolation abilities: Models trained on 7 segments generalize surprisingly well even on sequences up to 2,043,904 tokens! 🔝🚀 #AI #NLP #DeepLearning Image
6/ 🍃 Computational efficiency: RMT scales linearly for any model size with fixed segment length. Larger Transformers exhibit slower quadratic scaling, but RMT requires fewer FLOPs and can reduce FLOPs by up to 295x! 🌟✂️ #AI #NLP #DeepLearning #Efficiency Image
7/ 🔍 Attention Patterns of Memory Operations: RMT's attention maps reveal specific patterns in memory operations during a reasoning task. 💡📚 Image
8/ 🔗 Report: bit.ly/3Lk9jbQ
Code: bit.ly/40sMt6b
@booydar and @yurakuratov did all the job and I just have a lot of fun! 🥸
And finally on arxiv 🍾 arxiv.org/abs/2304.11062

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Mikhail Burtsev

Mikhail Burtsev Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(