Thread by @swyx on Thread Reader App

GPT4 is live!!!

openai.com/research/gpt-4

GPT4 gets 100% accuracy on this HumanEval task.

previous iterations were all under <50%.

holy shit.

Entire industries are being ripped up and chewed out.

RIP to:

- law students
- undergrads
- grad students
- High school biology/Calculus/Economics/Physics/Statistics/Govt

- Sommeliers
- Leetcoders

at least 10 full percentage point improvements on SOTA across 3 of the top/hardest LLM benchmarks.

improvements effortlessly transferring across languages.

But the headliner feature (widely rumored, now confirmed) is multimodality -

GPT can now see.

Not just classifying entire images.
Not segments of images.
This is arbitrarily flexible **visual comprehension**.

(and naturally, conversation, as we saw with Visual ChatGPT)

Another benefit of multimodality:

Combining visual inputs with world knowledge means you can ask it questions no vision model would be able to understand

Spotting unusual contexts and explaining memes.

Tell me this isn't some form of general intelligence.

People are likely to overlook the fine print in the paper but this is as big in my mind - major advances in safety have also been made in GPT4

19-29 full point improvements in hallucination reduction. A lot of work put into Harmful content alignment (controversial choice, ofc)

Prompt engineers unite - the GPT4 API now takes up to 50 pages of text (32k token context)!!!!!!

(Join Travis' discord to join 10k other ChatGPT hackers discord.gg/v9gERj825w)

as LLMs grow and grow and grow in capabilities, it is getting more impt to have good model evaluation/benchmarking frameworks.

OpenAI is also releasing their eval framework, fully MIT licensed: github.com/openai/evals

Used by Stripe and well documented. Runs MMLU in 189 LOC

GPT4 developer livestream in 2 hours.

We're hosting an impromptu watch party on the Latent Space discord: discord.gg/zVH8rvw6?event…

.@OpenAI seems to be echoing Ajeya Cotra's view that predicting scaling capability is key to managing AI safety. We can predict 10,000x ahead by extrapolating from smaller models. On that respect GPT4 has been a near-complete success using the now-familiar Azure supercluster.

@OpenAI In the generative AI era, it's very very good to be an OpenAI partner.

GPT4's image capability is launch exclusive to one nonprofit.

Stripe tested OpenAI Eval.

@yusuf_i_mehdi confirms Bing Chat runs on GPT4.

Khan Academy launched today with GPT-4 powered personal tutoring.

Whoop i just got let in to the ChatGPT4 beta!

This is going to get completely lost in the noise but @AnthropicAI launched Claude/Claude+ (with @notionhq and @poe_platform launch partner) and Google launched their PaLM API today as well

Google's blogpost is a hot mess tho lol

@AnthropicAI @NotionHQ @poe_platform Bombshell of a One More Thing dropped by @gdb:

GPT4 does Math now.

TAX MATH.

and it checks out.

thefuq???

live discusison ongoing twitter.com/i/spaces/1eaKb…

GPT4's image description capability is orders of magnitude more than existing CLIP derivative approaches.

Just yesterday i was listening to @labenz discuss the longevity of BLIP with @LiJunnan0409 and @DongxuLi_, i wonder what they think about it now...

The @CadeMetz NYT article offers a few more ideas for GPT4 multimodal usecases - take a photo of your fridge contents, get meal ideas.

inferring an impressive amount from very little visual info, and then putting them together in a sensible combo

needs adversarial testing 👿

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll