Jan P. Harries Profile picture
Co-Founder & CEO @ ellamind / #DiscoResearch / Retweets&favs are stuff i find interesting, not endorsements
Jul 23 21 tweets 8 min read
Live tweeting the most interesting insights from @Meta´s new Llama3 paper

1. How did the arrive at a 405b model trained with ~15T tokens?
"Extrapolation of the resulting scaling law to 3.8 × 1025 FLOPs suggests training a 402B parameter model on 16.55T tokens." 👇🧵 Image 2. The paper contains a surpisingly detailed description of the network topology for their 24k H100 cluster @dylan522p
Image
Image