swyx 🤖 Profile picture
Mar 14 17 tweets 12 min read
GPT4 is live!!!

openai.com/research/gpt-4
GPT4 gets 100% accuracy on this HumanEval task.

previous iterations were all under <50%.

holy shit.
Entire industries are being ripped up and chewed out.

RIP to:

- law students
- undergrads
- grad students
- High school biology/Calculus/Economics/Physics/Statistics/Govt

- Sommeliers
- Leetcoders
at least 10 full percentage point improvements on SOTA across 3 of the top/hardest LLM benchmarks.

improvements effortlessly transferring across languages.
But the headliner feature (widely rumored, now confirmed) is multimodality -

GPT can now see.

Not just classifying entire images.
Not segments of images.
This is arbitrarily flexible **visual comprehension**.

(and naturally, conversation, as we saw with Visual ChatGPT)
Another benefit of multimodality:

Combining visual inputs with world knowledge means you can ask it questions no vision model would be able to understand

Spotting unusual contexts and explaining memes.

Tell me this isn't some form of general intelligence.
People are likely to overlook the fine print in the paper but this is as big in my mind - major advances in safety have also been made in GPT4

19-29 full point improvements in hallucination reduction. A lot of work put into Harmful content alignment (controversial choice, ofc)
Prompt engineers unite - the GPT4 API now takes up to 50 pages of text (32k token context)!!!!!!

(Join Travis' discord to join 10k other ChatGPT hackers discord.gg/v9gERj825w)

as LLMs grow and grow and grow in capabilities, it is getting more impt to have good model evaluation/benchmarking frameworks.

OpenAI is also releasing their eval framework, fully MIT licensed: github.com/openai/evals

Used by Stripe and well documented. Runs MMLU in 189 LOC
GPT4 developer livestream in 2 hours.

We're hosting an impromptu watch party on the Latent Space discord: discord.gg/zVH8rvw6?event…
.@OpenAI seems to be echoing Ajeya Cotra's view that predicting scaling capability is key to managing AI safety. We can predict 10,000x ahead by extrapolating from smaller models. On that respect GPT4 has been a near-complete success using the now-familiar Azure supercluster.
@OpenAI In the generative AI era, it's very very good to be an OpenAI partner.

GPT4's image capability is launch exclusive to one nonprofit.

Stripe tested OpenAI Eval.

@yusuf_i_mehdi confirms Bing Chat runs on GPT4.

Khan Academy launched today with GPT-4 powered personal tutoring.
Whoop i just got let in to the ChatGPT4 beta!
This is going to get completely lost in the noise but @AnthropicAI launched Claude/Claude+ (with @notionhq and @poe_platform launch partner) and Google launched their PaLM API today as well

Google's blogpost is a hot mess tho lol
@AnthropicAI @NotionHQ @poe_platform Bombshell of a One More Thing dropped by @gdb:

GPT4 does Math now.

TAX MATH.

and it checks out.

thefuq???

live discusison ongoing twitter.com/i/spaces/1eaKb…
GPT4's image description capability is orders of magnitude more than existing CLIP derivative approaches.

Just yesterday i was listening to @labenz discuss the longevity of BLIP with @LiJunnan0409 and @DongxuLi_, i wonder what they think about it now...

The @CadeMetz NYT article offers a few more ideas for GPT4 multimodal usecases - take a photo of your fridge contents, get meal ideas.

inferring an impressive amount from very little visual info, and then putting them together in a sensible combo

needs adversarial testing 👿

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with swyx 🤖

swyx 🤖 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @swyx

Mar 12
Big Data may be dead, but looking at data is still stupendously underrated even in 2023.

Small collection of examples where looking at ✨analytics✨ changed the trajectory of a whole business:
First (and most famous, but gotta acknowledge the greats), @kevin pivoted his Foursquare mobile check-in competitor after hiring @mikeyk to look at analytics.

Mike saw that out of all the features they shipped, only one got off the charts usage.

and @Instagram was born
@kevin @mikeyk @instagram Looking at analytics isn't always as grand as making one of the most successful mobile apps of all time.

I loved this smaller story of how @theknot did a "starburst analysis" in @ModeAnalytics to find out that a easter egg was a huge retention cheat code

Read 5 tweets
Jan 3
ChatGPT’s current killer app isn’t search, therapy, doing math, controlling browsers, emulating a virtual machine, or any of that other cherrypicked examples that come with huge disclaimers.

It’s a lot more quotidian:

Reformatting information from any format X to any format Y.
“ChatGPT reformatting” requires minimal world knowledge, are instantly verifiable, and can reliably save minutes of work multiple times a day.

The reformat can include contextual inference, which saves even more time at the cost of a bit more risk:

More things count as formats than you are used to.

The conversational nature of ChatGPT allows you to feed in more info, blending human and machine to work together better than either could alone.

Here we see a reformat of Initial request + Error stack trace => Working Code
Read 6 tweets
Nov 25, 2022
🆕 Why "Prompt Engineering" and "Generative AI" are overhyped

lspace.swyx.io/p/why-prompt-e…

- Quick recap of @StableDiffusion 2.0 puts & takes
- Why “Prompt Engineering” is a Product Smell
- Why “Generative AI” is underselling the potential
@StableDiffusion #StableDiffusion2 is all the hotness right now, but there'll be new versions every few months.

I wanted to zoom out and talk about the open secret that all AI insiders seem to know: Prompt engineering is just a kludge.

Quotes from @gwern, @sama, @GaryMarcus, @josephofiowa: Image
@StableDiffusion @gwern @sama @GaryMarcus @josephofiowa Completely eliminating prompts may not be possible: that is basically solving alignment, and that is AGI-hard ()

But we could take a page from @ryanjsalva's Copilot and @nbashaw's Lex.page, and *make the IDE/UI the prompt*: Image
Read 7 tweets
Nov 20, 2022
Convinced that all devs should work on a database as part of training.

Ever joked about DataStructures & Algorithms only being useful at interviews?

Work on a DB

Ever wondered why {{ FAVE_APP }} is slow?

Probably a DB

Prefer compilers?

allow me to introduce query planners..
Perhaps my real hot take is that databases are the CS grads abstracting away all the Hard Problems so that us bootcamp grads can cosplay being "full stack" with literally 1 day of SQL experience before getting hired to make 6 figures making rectangles on server vs on client
i'm basically Erik on a 15 year time delay

I'm starting to suspect there's only two fundamental value props in verticalized tech: offering a better data schema for the problem, and building a better data store for the schema.

do both and you win
Read 7 tweets
Nov 18, 2022
This is a HUGE milestone.

I don’t think people outside SF have any idea how close we are to self driving. Most people think it’s “continually 5 years away”.

This is no joke. Cruise now has “push button, get car” autonomous taxis 24hrs/day in one of the busiest cities on earth.
i got a nighttime joyride with @alexbowe recently and while we couldnt test heavy traffic, we pointed the car at the most challenging road we could think of inside the drivable zone - the incline squiggly road in Potrero Hill.

SF is a hilly city but was handled like a champ. twitter.com/i/web/status/1… Image
this problem is incredibly complex.

here we are going in two lane traffic, but someone ahead of us has opened a parked car door on the side of traffic.

Our vehicle paused for them, waited for cars on our right to pass, and then *overtook* the human that was blocking us.
Read 4 tweets
Oct 11, 2022
Looking to improve my mobile automation:

What iOS Shortcuts do you use?

(I don’t have a smart home but would like to be smarter everywhere else)
ok early doors but this is in the running to be #1 time saver
lol all the best ones are just various forms of textexpander on mobile
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(