Latest Twitter Threads by @harmdevries77 on Thread Reader App

Apr 13, 2023 • 8 tweets • 3 min read

Surprised by the loss of LLaMA-7B still going down after 1 trillion tokens?

In a new blogpost, I explain why you shouldn't be and argue we haven't reached the limit of the recent trend of training smaller LLMs for longer:
harmdevries.com/post/model-siz…

Analysis in 🧵👇

The result follows from the Chinchilla scaling laws providing insight into the model size and compute overhead trade-off.

Let's start Chinchilla's 3rd approach: it models the loss L as a function of the number of parameters N and number of training tokens D.

Share this page!

Enter URL or ID to Unroll