Tweet

Maybe Scrolly?

swyx 🤖

Follow @swyx

Mar 14 • 17 tweets • 12 min read

GPT4 is live!!!

openai.com/research/gpt-4

GPT4 gets 100% accuracy on this HumanEval task.

previous iterations were all under <50%.

holy shit.

Entire industries are being ripped up and chewed out.

RIP to:

- law students
- undergrads
- grad students
- High school biology/Calculus/Economics/Physics/Statistics/Govt

- Sommeliers
- Leetcoders

at least 10 full percentage point improvements on SOTA across 3 of the top/hardest LLM benchmarks.

improvements effortlessly transferring across languages.

But the headliner feature (widely rumored, now confirmed) is multimodality -

GPT can now see.

Not just classifying entire images.
Not segments of images.
This is arbitrarily flexible **visual comprehension**.

(and naturally, conversation, as we saw with Visual ChatGPT)

Another benefit of multimodality:

Combining visual inputs with world knowledge means you can ask it questions no vision model would be able to understand

Spotting unusual contexts and explaining memes.

Tell me this isn't some form of general intelligence.

People are likely to overlook the fine print in the paper but this is as big in my mind - major advances in safety have also been made in GPT4

19-29 full point improvements in hallucination reduction. A lot of work put into Harmful content alignment (controversial choice, ofc)

https://twitter.com/transitive_bs/status/1635694410905137238?s=20

Prompt engineers unite - the GPT4 API now takes up to 50 pages of text (32k token context)!!!!!!

(Join Travis' discord to join 10k other ChatGPT hackers discord.gg/v9gERj825w)

https://twitter.com/transitive_bs/status/1635694410905137238?s=20

as LLMs grow and grow and grow in capabilities, it is getting more impt to have good model evaluation/benchmarking frameworks.

OpenAI is also releasing their eval framework, fully MIT licensed: github.com/openai/evals

Used by Stripe and well documented. Runs MMLU in 189 LOC

GPT4 developer livestream in 2 hours.

We're hosting an impromptu watch party on the Latent Space discord: discord.gg/zVH8rvw6?event…

@OpenAI

.@OpenAI seems to be echoing Ajeya Cotra's view that predicting scaling capability is key to managing AI safety. We can predict 10,000x ahead by extrapolating from smaller models. On that respect GPT4 has been a near-complete success using the now-familiar Azure supercluster.

@OpenAI

@OpenAI In the generative AI era, it's very very good to be an OpenAI partner.

GPT4's image capability is launch exclusive to one nonprofit.

Stripe tested OpenAI Eval.

@yusuf_i_mehdi confirms Bing Chat runs on GPT4.

Khan Academy launched today with GPT-4 powered personal tutoring.

Whoop i just got let in to the ChatGPT4 beta!

@AnthropicAI

This is going to get completely lost in the noise but @AnthropicAI launched Claude/Claude+ (with @notionhq and @poe_platform launch partner) and Google launched their PaLM API today as well

https://twitter.com/adamdangelo/status/1635690630289723394?s=20

Google's blogpost is a hot mess tho lol

@AnthropicAI

@AnthropicAI @NotionHQ @poe_platform Bombshell of a One More Thing dropped by @gdb:

GPT4 does Math now.

TAX MATH.

and it checks out.

thefuq???

live discusison ongoing twitter.com/i/spaces/1eaKb…

@labenz

GPT4's image description capability is orders of magnitude more than existing CLIP derivative approaches.

Just yesterday i was listening to @labenz discuss the longevity of BLIP with @LiJunnan0409 and @DongxuLi_, i wonder what they think about it now...

https://twitter.com/altryne/status/1635736338397020160

@CadeMetz

The @CadeMetz NYT article offers a few more ideas for GPT4 multimodal usecases - take a photo of your fridge contents, get meal ideas.

inferring an impressive amount from very little visual info, and then putting them together in a sensible combo

needs adversarial testing 👿

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @swyx

swyx 🤖

@swyx

Mar 12

Big Data may be dead, but looking at data is still stupendously underrated even in 2023.

Small collection of examples where looking at ✨analytics✨ changed the trajectory of a whole business:

@kevin

First (and most famous, but gotta acknowledge the greats), @kevin pivoted his Foursquare mobile check-in competitor after hiring @mikeyk to look at analytics.

Mike saw that out of all the features they shipped, only one got off the charts usage.

and @Instagram was born

@kevin

@kevin @mikeyk @instagram Looking at analytics isn't always as grand as making one of the most successful mobile apps of all time.

I loved this smaller story of how @theknot did a "starburst analysis" in @ModeAnalytics to find out that a easter egg was a huge retention cheat code

https://twitter.com/swyx/status/1554868229335572480

Read 5 tweets

swyx 🤖

@swyx

Jan 3

ChatGPT’s current killer app isn’t search, therapy, doing math, controlling browsers, emulating a virtual machine, or any of that other cherrypicked examples that come with huge disclaimers.

It’s a lot more quotidian:

Reformatting information from any format X to any format Y.

“ChatGPT reformatting” requires minimal world knowledge, are instantly verifiable, and can reliably save minutes of work multiple times a day.

The reformat can include contextual inference, which saves even more time at the cost of a bit more risk:

https://twitter.com/mathemagic1an/status/1610147883894079488?s=46&t=6cfv4l9WJyXXnRX__Q-_hg

https://twitter.com/rreverser/status/1599184983402446848

More things count as formats than you are used to.

The conversational nature of ChatGPT allows you to feed in more info, blending human and machine to work together better than either could alone.

Here we see a reformat of Initial request + Error stack trace => Working Code

https://twitter.com/rreverser/status/1599184983402446848

Read 6 tweets

swyx 🤖

@swyx

Nov 25, 2022

@StableDiffusion

🆕 Why "Prompt Engineering" and "Generative AI" are overhyped

lspace.swyx.io/p/why-prompt-e…

- Quick recap of @StableDiffusion 2.0 puts & takes
- Why “Prompt Engineering” is a Product Smell
- Why “Generative AI” is underselling the potential

@StableDiffusion

@StableDiffusion #StableDiffusion2 is all the hotness right now, but there'll be new versions every few months.

I wanted to zoom out and talk about the open secret that all AI insiders seem to know: Prompt engineering is just a kludge.

Quotes from @gwern, @sama, @GaryMarcus, @josephofiowa:

@StableDiffusion

@StableDiffusion @gwern @sama @GaryMarcus @josephofiowa Completely eliminating prompts may not be possible: that is basically solving alignment, and that is AGI-hard (

https://twitter.com/swyx/status/1585810989521379329?lang=en

)

But we could take a page from @ryanjsalva's Copilot and @nbashaw's Lex.page, and *make the IDE/UI the prompt*:

Read 7 tweets

swyx 🤖

@swyx

Nov 20, 2022

Convinced that all devs should work on a database as part of training.

Ever joked about DataStructures & Algorithms only being useful at interviews?

Work on a DB

Ever wondered why {{ FAVE_APP }} is slow?

Probably a DB

Prefer compilers?

allow me to introduce query planners..

Perhaps my real hot take is that databases are the CS grads abstracting away all the Hard Problems so that us bootcamp grads can cosplay being "full stack" with literally 1 day of SQL experience before getting hired to make 6 figures making rectangles on server vs on client

https://twitter.com/bernhardsson/status/1493593885532168193

i'm basically Erik on a 15 year time delay

https://twitter.com/bernhardsson/status/1493593885532168193

I'm starting to suspect there's only two fundamental value props in verticalized tech: offering a better data schema for the problem, and building a better data store for the schema.

do both and you win

Read 7 tweets

swyx 🤖

@swyx

Nov 18, 2022

https://twitter.com/kvogt/status/1593027626494726145

This is a HUGE milestone.

I don’t think people outside SF have any idea how close we are to self driving. Most people think it’s “continually 5 years away”.

This is no joke. Cruise now has “push button, get car” autonomous taxis 24hrs/day in one of the busiest cities on earth.

https://twitter.com/kvogt/status/1593027626494726145

@alexbowe

i got a nighttime joyride with @alexbowe recently and while we couldnt test heavy traffic, we pointed the car at the most challenging road we could think of inside the drivable zone - the incline squiggly road in Potrero Hill.

SF is a hilly city but was handled like a champ. twitter.com/i/web/status/1…

this problem is incredibly complex.

here we are going in two lane traffic, but someone ahead of us has opened a parked car door on the side of traffic.

Our vehicle paused for them, waited for cars on our right to pass, and then *overtook* the human that was blocking us.