Co-Founder & CEO @ ellamind / #DiscoResearch /
Retweets&favs are stuff i find interesting, not endorsements
Jul 23, 2024 • 21 tweets • 8 min read
Live tweeting the most interesting insights from @Meta´s new Llama3 paper
1. How did the arrive at a 405b model trained with ~15T tokens?
"Extrapolation of the resulting scaling law to 3.8 × 1025 FLOPs suggests training a 402B parameter model on 16.55T tokens." 👇🧵 2. The paper contains a surpisingly detailed description of the network topology for their 24k H100 cluster @dylan522p