swyx Profile picture
Mar 14, 2023 17 tweets 12 min read Read on X
GPT4 is live!!!

openai.com/research/gpt-4
GPT4 gets 100% accuracy on this HumanEval task.

previous iterations were all under <50%.

holy shit.
Entire industries are being ripped up and chewed out.

RIP to:

- law students
- undergrads
- grad students
- High school biology/Calculus/Economics/Physics/Statistics/Govt

- Sommeliers
- Leetcoders
at least 10 full percentage point improvements on SOTA across 3 of the top/hardest LLM benchmarks.

improvements effortlessly transferring across languages.
But the headliner feature (widely rumored, now confirmed) is multimodality -

GPT can now see.

Not just classifying entire images.
Not segments of images.
This is arbitrarily flexible **visual comprehension**.

(and naturally, conversation, as we saw with Visual ChatGPT)
Another benefit of multimodality:

Combining visual inputs with world knowledge means you can ask it questions no vision model would be able to understand

Spotting unusual contexts and explaining memes.

Tell me this isn't some form of general intelligence.
People are likely to overlook the fine print in the paper but this is as big in my mind - major advances in safety have also been made in GPT4

19-29 full point improvements in hallucination reduction. A lot of work put into Harmful content alignment (controversial choice, ofc)
Prompt engineers unite - the GPT4 API now takes up to 50 pages of text (32k token context)!!!!!!

(Join Travis' discord to join 10k other ChatGPT hackers discord.gg/v9gERj825w)

as LLMs grow and grow and grow in capabilities, it is getting more impt to have good model evaluation/benchmarking frameworks.

OpenAI is also releasing their eval framework, fully MIT licensed: github.com/openai/evals

Used by Stripe and well documented. Runs MMLU in 189 LOC
GPT4 developer livestream in 2 hours.

We're hosting an impromptu watch party on the Latent Space discord: discord.gg/zVH8rvw6?event…
.@OpenAI seems to be echoing Ajeya Cotra's view that predicting scaling capability is key to managing AI safety. We can predict 10,000x ahead by extrapolating from smaller models. On that respect GPT4 has been a near-complete success using the now-familiar Azure supercluster.
@OpenAI In the generative AI era, it's very very good to be an OpenAI partner.

GPT4's image capability is launch exclusive to one nonprofit.

Stripe tested OpenAI Eval.

@yusuf_i_mehdi confirms Bing Chat runs on GPT4.

Khan Academy launched today with GPT-4 powered personal tutoring.
Whoop i just got let in to the ChatGPT4 beta!
This is going to get completely lost in the noise but @AnthropicAI launched Claude/Claude+ (with @notionhq and @poe_platform launch partner) and Google launched their PaLM API today as well

Google's blogpost is a hot mess tho lol
@AnthropicAI @NotionHQ @poe_platform Bombshell of a One More Thing dropped by @gdb:

GPT4 does Math now.

TAX MATH.

and it checks out.

thefuq???

live discusison ongoing twitter.com/i/spaces/1eaKb…
GPT4's image description capability is orders of magnitude more than existing CLIP derivative approaches.

Just yesterday i was listening to @labenz discuss the longevity of BLIP with @LiJunnan0409 and @DongxuLi_, i wonder what they think about it now...

The @CadeMetz NYT article offers a few more ideas for GPT4 multimodal usecases - take a photo of your fridge contents, get meal ideas.

inferring an impressive amount from very little visual info, and then putting them together in a sensible combo

needs adversarial testing 👿

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with swyx

swyx Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @swyx

Nov 6, 2023
Join @latentspacepod and @thursdai_pod live at DevDay!

Now:

spotted: “New Products Deep Dive” for 45 mins… I wonder what that will be twitter.com/i/spaces/1BRJj…

GPT4 Turbo is ~3x cheaper than GPT4!

1. OpenAI's longest ever Context length: 128k
2. Better JSON/function calling
3. Knowledge: built in RAG and April 2023 cutoff
4. Dalle3, GPT4-V, and TTS model all in API today!!!
4b. Whisper V3 open sourced (coming to API)
5. Customization: GPT3 16k, GPT4 finetuning, Custom Models services
6. Higher Rate Limits - 2x tokens per minute, request raises in account settings - plus: Copyright Shield!

"GPT4 Turbo is a smarter model than GPT4" (GPT4.5 confirmed!)


Image
Image
Image
@latentspacepod @thursdai_pod Friendship resumed with Satya senpai Image
Read 23 tweets
Oct 10, 2023
it’s official - I think GitHub Copilot is the first* generative AI product to publicly claim they’ve passed $100m ARR — enough to stand alone as a publicly listed company

Whenever people ask me “is AI a fad” the biggest thing I point to is “follow the money”:

- revenue, not just funding
- RECURRING, not tcosts on hype
- people publicly saying they’d pay 5x the cost

(*there’s likely a few others but none confirmed officially - see Anatomy of Autonomy post on @latentspacepod)
Image
next up is @DedyKredo LIVE CODING a full test suite, making code changes, and automating commit and PR review, all assisted by @CodiumAI . audible “what the fuck” from @eugeneyan.



ends with a powerful message for Israel. we stand with you @itamar_mar. youtube.com/live/qw4PrtyvJ…
@DedyKredo @CodiumAI @eugeneyan @itamar_mar i think @mdwelsh is an early contender for the “@Dharmesh Award for Most Laughs Per Minute” in his talk

youtube.com/live/qw4PrtyvJ…
Read 17 tweets
Jul 18, 2023
That was fast - Llama 2 is out!

and cleared for commercial use! and *destroys* Falcon 40B on @DanHendrycks's MMLU and other top benchmarks

They really meant it when they said "imminently" lol



Scheduled a @latentspacepod at 3pm PT - join @FanaHOVA and… https://t.co/iWFLYJLCJd https://t.co/C0YKJ8snjr https://t.co/TZvfRrz5lKtwitter.com/i/spaces/1nAKE…
twitter.com/i/web/status/1…



Image
Image
Image
@DanHendrycks @latentspacepod @FanaHOVA it seems @mascobot is on top of it - you can try out llama 2 here:

they also have a Llama playground but its not currently working for me https://t.co/cao0EUYWQSreplicate.com/a16z-infra/lla…
LLaMA 2 seems to know its own name...? what chat completions were given to it?

and these examples from @rajko_rad are great, thank you a16z (why did a16z get the hookup tho? so many questions haha)


Image
Image
Image
Image
Read 17 tweets
Jun 30, 2023
🆕 Essay: The Rise of the AI Engineer



Keeping up on AI is becoming a full time job.

Let's get together and define it. https://t.co/KD2lY9FTtmlatent.space/p/ai-engineer
Builders need a place to talk turpentine. This is why i'm teaming up with @benghamine to produce @aiDotEngineer, the definitive place to talk AI UX, devtools, infra, and all things AI Engineering.

500 seats.
SF/Virtual, Oct 8-10.

Join us!

Screenshot from a DM today. We're changing lives and encouraging people to get over their own mental hurdles to build with AI. I love this so much.

https://t.co/MZXlc9VQzu
Read 5 tweets
Jun 20, 2023
The @latentspacepod is excited to publish:

Petaflops to the People:
@realGeorgeHotz's first interview
on his new personal compute cluster company

the tiny corp.

latent.space/p/geohot

We discuss how tiny is taking on Nvidia, Google, and PyTorch with a tiny team and go deep… twitter.com/i/web/status/1…
@latentspacepod @realGeorgeHotz GPT4 is 8 x 220B params = 1.7 Trillion params



ok I wasn't sure how widely to spread the rumors on GPT-4 but it seems Soumith is also confirming the same so here's the quick clip!

so yes, GPT4 is technically 10x the size of GPT3, and all the small… twitter.com/i/web/status/1…
since MoE is So Hot Right Now, GLaM might be the paper to pay attention to. Google already has a 1.2T model with 64 experts, while Microsoft Bing’s modes are different mixes accordingly Image
Read 6 tweets
Jun 7, 2023
this is a trend I'm calling "Code is all you need"

Comparing Bard vs @OpenAI ChatGPT vs @AnthropicAI Claude on Google's own reasoning/math prompts shows the stark contrast once you make your model write and eval code to answer questions. Reminds me of @amasad and @goodside's… twitter.com/i/web/status/1… ImageImageImage
@OpenAI @AnthropicAI @amasad @goodside This is part of a broader trend of us slowly discovering the special place of code in language models:

1/ Code Improves LLMs
@Francis_YAO_ et al have repeatedly found that adding code in pretraining data improves LLMs in all benchmarks ( )

2/ Code LLMs… twitter.com/i/web/status/1…
@OpenAI @AnthropicAI @amasad @goodside @Francis_YAO_ Implication 4 is the new meta:

LLMs Making Their Own Tools

In retrospect this was obvious; many folks have defined tool using/tool making as a marker of intelligence between human and animal.

We've also seen ChatGPT Code Interpreter dynamically generate and execute code per… twitter.com/i/web/status/1… Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(