Deep dive into "ZeRO: Memory Optimizations Toward Training Trillion Parameter Models" by Samyam Rajbhandari, Olatunji Ruwase, Yuxiong He & @jeffra45

It proposes an optimizer to build huge language pre-trained models.

Thread👇🏼 🔎
thesequence.substack.com/p/-edge22-mach…
Zero Redundancy Optimizer (ZeRO) is an optimization module that maximizes both memory and scaling efficiency.

2/
It tries to address the limitations of data parallelism and model parallelism while achieving the merits of both

thesequence.substack.com/p/-edge22-mach…

3/
ZeRO uses an approach called ZeRO-powered data parallelism, removing the memory redundancies across data-parallel processes

4/
DeepSpeed is a new open-source framework focused on optimizing the training of massively large deep learning models.

Includes the first implementation of ZeRO as well as other optimization methods.
6/
TheSequence Edge covers:

+An ML concept you should learn
+A review of an impactful research paper
+New ML framework or platform and how you can use it
thesequence.substack.com/subscribe
7/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with TheSequence

TheSequence Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheSequenceAI

8 May
AllenNLP @allen_ai = an Important Framework for NLU Researchers
Thread🧵👇
thesequence.substack.com/p/-edge22-mach… Image
❓AllenNLP:
+includes key building blocks for NLU
+offers state of the art NLU methods
+facilitates the work of researchers
thesequence.substack.com/p/-edge22-mach…
2/
AllenNLP is built on top of @PyTorch and designed with experimentation in mind

Key contribution = maintains implementations of new models:
+text generation,
+question answering,
+sentiment analysis
+& many others
3/
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(