Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Tom Goldstein

@tomgoldsteincs

Dec 6, 2022 • 10 tweets • 2 min read • Read on X

How many GPUs does it take to run ChatGPT? And how expensive is it for OpenAI? Let’s find out! 🧵🤑

We don’t know the exact architecture of ChatGPT, but OpenAI has said that it is fine-tuned from a variant of GPT-3.5, so it probably has 175B parameters. That's pretty big.

How fast could it run? A 3-billion parameter model can generate a token in about 6ms on an A100 GPU (using half precision+tensorRT+activation caching). If we scale that up to the size of ChatGPT, it should take 350ms secs for an A100 GPU to print out a single word.

Of course, you could never fit ChatGPT on a single GPU. You would need 5 80Gb A100 GPUs just to load the model and text. ChatGPT cranks out about 15-20 words per second. If it uses A100s, that could be done on an 8-GPU server (a likely choice on Azure cloud).

So what would this cost to host? On Azure cloud, each A100 card costs about $3 an hour. That's $0.0003 per word generated.

But it generates a lot of words! The model usually responds to my queries with ~30 words, which adds up to about 1 cent per query.

ChatGPT acquired 1M users within its first 5 days of operation. If an average user has made 10 queries per day, I think it’s reasonable to estimate that ChatGPT serves ~10M queries per day.

https://twitter.com/sama/status/1599668808285028353?s=20&t=W0Dm9g9nBrNuxePU0-5oFg

I estimate the cost of running ChatGPT is $100K per day, or $3M per month. This is a back-of-the-envelope calculation. I assume nodes are always in use with a batch size of 1. In reality they probably batch during high volume, but have GPUs sitting fallow during low volume.

The real costs for a typical organization would almost certainly be higher than this because parallelization is not 100% efficient, GPUs are not 100% utilized, and my runtime estimate is optimistic.

The cost to OpenAI may be lower though, because of its partnership with Microsoft.

Either way, that ain't cheap. Some say it's wasteful to pour these kinds of resources (and carbon) into a demo. But hey, it's not the worst use of Elon's money that we've seen of late 💸💸💸

@jwkirchenbauer

Thanks to NLP gurus @jwkirchenbauer and @jonasgeiping for their inputs on this thread.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @tomgoldsteincs

Tom Goldstein

@tomgoldsteincs

Feb 10

New open source reasoning model!

Huginn-3.5B reasons implicitly in latent space 🧠

Unlike O1 and R1, latent reasoning doesn’t need special chain-of-thought training data, and doesn't produce extra CoT tokens at test time.

We trained on 800B tokens 👇

Huginn was built for reasoning from the ground up, not just fine-tuned on CoT.

We built our reasoning system by putting a recurrent block inside the LLM. On a forward pass, we loop this block a random number of times. By looping it more times, we dial up compute.

Recurrence improves reasoning a lot. To show this, we did a comparison with a standard architecture.

We train a standard 3.5B LLM from scratch on 180B tokens. Then we train a recurrent 3.5B model on the same tokens.

The recurrent model does 5X better on GSM8K.

Read 7 tweets

Tom Goldstein

@tomgoldsteincs

Jun 20, 2024

LLMs have low randomness: if you ask the same thing twice you get similar responses. Generator prompts are a way to boost the randomness of LLMs.

Using a few generator prompts, I had Gemini write an entire instruction tuning dataset from scratch. It outperform popular datasets.

Let’s start with a toy example of why we need generator prompts. Suppose I want a list of different colors. So I feed this prompt to Gemini 1000 times. This does poorly - I only get 33 unique outputs from 1000 runs. I need more randomness.

A generator prompt asks the model to enumerate a long list of execution paths, and then randomizes which paths get chosen.

Here's an example. The numbers 23 and 76 are randomized each time the prompt is called.

This prompt gives me 782 unique outputs from 1000 runs.

Read 9 tweets

Tom Goldstein

@tomgoldsteincs

Oct 12, 2023

🚨 This one simple trick will level up your LLM🚀🚀

Wait...don't go. This isn't a blue check grifter tweet!

Instruction tuning with this easy trick will *actually* boost AlpacaEval scores, even for large (70B) and llama2-chat base models…by a lot 🧵

Ok, here's the trick: during instruction finetuning, we add uniform random noise to the word embeddings.

That's it. Nothing else.

We tried this on a bunch of base models and finetuning datasets. They all showed big gains.

Even when the base model is already highly refined (e.g. llama2-chat) or very large (llama2-70B) the benefits of NEFTune are still quite strong.

Read 8 tweets

Tom Goldstein

@tomgoldsteincs

Jul 19, 2023

The Llama2 model is pretty impressive. Human evaluators rank it slightly *better* than ChatGPT on a range of things (excluding code and reasoning).

Here's a short TL;DR on what Meta did to improve the state of the art 🧵

Llama1: Small models (7B & 13B) were trained on 1 trillion tokens. Large models saw 1.4T tokens.

Llama2: All models trained on 2T tokens. This means the small models are "over trained" beyond what the scaling laws recommend, resulting in great performance for small models!

As a result of the long training runs, Llama2 beats other major open-source models at most academic benchmarks. Their 7B model is WAY better than other 7B options on all tasks except code.

Read 11 tweets

Tom Goldstein

@tomgoldsteincs

Jul 5, 2023

Nvidia’s AI products follow a weird reverse Moore’s law: every two years, you get half as many FLOPS for your money. This is the opposite of the rest of the chip market 📈

With the H100 release, Nvidia had to reverse course.

A 🧵 on Nvidia losing its grip on the GPU market.

Let’s focus in on the machine learning GPUs. You can see the value drop over time, until the H100 created an uptick. Note: I’m using today’s price for each card, but a similar downward trend also holds for the release prices.

The drop is because of monopoly power and clever market segmentation.
Example: The “server-grade” V100 is a minor variant of the 2080ti gaming card. Nvidia sells it to institutions instead of gamers, charging 5X more for the V100. This means huge profits.
lambdalabs.com/blog/best-gpu-…

Read 11 tweets

Tom Goldstein

@tomgoldsteincs

Jun 19, 2023

Training an LLM takes about 1 trillion words. That’s about 30,000 years of typing.
But where does this data come from?
And what does this have to do with the Reddit protests?
Here’s how OpenAI trains models on “the entire internet.” 🧵📜

Much of what we know about OpenAI is from urban legends. But the GPT3 paper does have a table showing their data sources. The cliché that LLMs are trained on “the whole internet” comes from the use of CommonCrawl.

CommonCrawl (CC) is a non-profit that scrapes the internet with bots and tries to record everything since 2008. 90% of CC is HTML, CSS, and scripts. The usable 10% contains junk that needs to be tossed out to clean the dataset.

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Tom Goldstein

Try unrolling a thread yourself!

More from @tomgoldsteincs

Tom Goldstein

Tom Goldstein

Tom Goldstein

Tom Goldstein

Tom Goldstein

Tom Goldstein

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!