Shital Shah Profile picture
Deep learning research and papers. If universe is an optimizer, what is the loss function? All opinions are my own.
Aug 6, 2023 9 tweets 3 min read
Just learned something very cool about LR schedules. This one is so huge it surprises me that it's not in its own paper but rather tucked away.

Problem: Most training use cosine/linear decays but this requires specifying number of steps in advance. This is quite troublesome. 🧵 For example, LLAMA2 trains for 500k steps but when you look at training curves, it is obvious that you could have kept going except that now you can't because it's too late. Repeating entire run with new larger steps is too expensive. Image
Mar 19, 2023 7 tweets 2 min read
While folks in academia are freaking out about not having as much compute as OpenAI and depressed about their research agenda, it might be easy to miss some critical trends. My bet is that both “L” in LLM will be gone in about 2 years. 🧵 Before GPT-4 came out, many people were still quite smug about scaling. One of the biggest thing in GPT-4 paper is the verification that one can precisely predict performance on actual benchmarks like HumanEval with scale. This is huge. Scaling is real and here to stay but…
Dec 6, 2022 25 tweets 6 min read
There are several things that ChatGPT fails spectacularly (for ex, 3 digit multiplication, ASCII art). These fails gives a lot of insights in inner workings but what is far more astonishing is things that it succeeds unbelievably. Here's some of my favorites: 🧵
Dec 2, 2022 25 tweets 6 min read
ChatGPT was dropped on us just bit over 24 hours. It's like you wake up to the news of first nuclear explosion and you don't know yet what to think about it but you know world will never be the same again. Here some interesting snapshots of this "explosion"🧵: