Discover and read the best of Twitter Threads about #jeanzay

Most recents (1)

🌸 The @BigScienceLLM BLOOM 176B parameters model training has just passed 230B tokens: that’s more than a million books in two months!

🤔 But how did we decide what model to train with our one million GPU hours?

⬇️ Thread time! #acl2022nlp
🏅 We had five main considerations: it needed to be proven, scalable, efficient, multilingual, and to exhibit emergent capabilities (e.g. zero-shot generalization)

⏰ At the >100B scale, every inefficiency matters! We can’t afford an unoptimized setup…
🤗 Thanks to a generous grant from @Genci_fr on #JeanZay, we had plenty of compute to benchmark our dream architecture.

📈 We ran our experiments with 1.3B models, pretraining on 100-300B tokens, to increase the likelihood our findings would transfer to the final >100B model.
Read 14 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!