🌸 The @BigScienceLLM BLOOM 176B parameters model training has just passed 230B tokens: that’s more than a million books in two months!
🤔 But how did we decide what model to train with our one million GPU hours?
⬇️ Thread time! #acl2022nlp
🏅 We had five main considerations: it needed to be proven, scalable, efficient, multilingual, and to exhibit emergent capabilities (e.g. zero-shot generalization)
⏰ At the >100B scale, every inefficiency matters! We can’t afford an unoptimized setup…
Jun 26, 2020 • 9 tweets • 4 min read
💡 Can we learn challenging tasks without backpropagation? Scale a biologically-motivated method to hard datasets? Without *any* knowledge of the forward weights in the backward? Yes, We Can!
🎓 arxiv.org/abs/2006.12878
Joint work with @iacopo_poli@KrzakalaF@LightOnIO
[1/9]
🧐 A central question in bio-inspired ML is the weight transport problem: the backward pass cannot realistically access information about the forward weights. While local learning has been demonstrated, methods devoid of weight transport fail on computer vision tasks.
[2/9]