Thread by @RamaswmySridhar on Thread Reader App

1/ #ChatGPT is closing out 2022 with a bang, but what’s next? 💥

@OpenAI’s #GPT4 is set to be the first big #AI thing in 2023.

So here are some bold, optimistic, yet sensible predictions from me, @vivek7ue and @rajhans_samdani ... 👀

2/ Biggest model size for GPT-4 will be 1T parameters. Up 6x.

Not 100T parameters like some AI hypers are claiming ().2/ Biggest model size for GPT-4 will be 1T parameters. Up 6x

3/ The reason is simple: instruction fine tuning achieves same quality with 100x smaller models.
arxiv.org/pdf/2203.02155…

4/ As such, the pre-trained GPT-4 model will appear to be a modest improvement over Chincilla, PALM and U-PALM on HELM and BigBench.

The raw stats on GPT-4 will look underwhelming at first glance and incremental relative to GPT-3.

5/ The hidden secret of #LLMs? How much training data you have matters as much as model size.

GPT-4 will use 10T tokens. Up 33x, and putting them on the Chinchilla scaling curve.

6/ Biggest user facing change? Longer context windows.

We expect 16384 tokens (⬆️ from 4096).

7/ Biggest pre-training modeling change? A loss function that looks like UL2 (arxiv.org/pdf/2205.05131…).

8/ Put together, at least 800x more compute for the pre-trained model.

And that will mean it’s better. 🙌 🙌

9/ Lots of the pre-training secret sauce will be in what goes in. 🤫

We expect:
➡️ A lot more dialog data (from @Twitter, @Reddit and elsewhere)
➡️ Proprietary signals from @bing's index and maybe even Bing clicks

10/ The instruction-following models will continue to be state of the art relative to everyone else (see the HELM comparisons at arxiv.org/abs/2211.09110)

11/ They will:

👉 Incorporate RLHF/PPO (like GPT3.5)
👉 Use proprietary prompt-following training data from the OpenAI playground (that other research groups can't access)

12/ PPO preference training will re-use some of the tricks @AnthropicAI is using to be more helpful and harmless in their constitutional training paradigm

13/ InstructGPT-3 used about 20B tokens during PPO = 6% of total GPT-3 compute.

Since instruction fine tuning is a lot more compute optimal, we expect a lot more compute to be spent in the supervised fine-tuning and PPO phases.

14/ GPT-4 will be fine-tuned on all the feedback data from ChatGPT, and that will be the key to a significant improvement.

With a million prompts a day from ChatGPT, we expect compute used in PPO to go up a lot in GPT-4.

15/ And finally, like with ChatGPT, OpenAI will NOT publish details about GPT4 as a paper, leaving the world guessing what's in there.

This will start a trend where all the big foundation model companies will stop publishing details of their models.

OpenAI will be Open no more.

16/ This will leave a BIG opportunity for open model efforts from the likes of @AiEleuther, @huggingface, Big Science’s BLOOM, @togethercompute, and @carperai to step up their game.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll