Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Oct 1, 2024 • 43 tweets • 17 min read • Read on X

https://twitter.com/romainhuet/status/1841161854217044235

Here’s my @OpenAIDevs day thread for those following along. everyone else gotchu with videos and stuff so i will just give personal notes and aha moments thru the day

first observation: @sama MIA

GPT5 still mentioned and on the table

https://twitter.com/romainhuet/status/1841161854217044235

after some nice screenshot of Cocounsel, time for @romainhuet’s legendary live demos. o1 one-shots an ios app and does the frotnend/backend to control a drone.

ai controlled drones, what could go wrong?

@romainhuet Realtime API announced!

starting with speech to speech support
all 6 adv voice mode voices supported

demo next

@romainhuet Realtime voice mode in playground now

playground shows event logs for u to react to

playground now has autoprompting that also generates fewshot examples and function calling schemas

voice mode has function calling and is weirdly obsessed with strawberrries

he is integrating with @twilio api and ordering strawberries for all ofnus! classic twilo demo

@twilio realtime api uses 4o as backbone and is public beta starting today

https://x.com/simonw/status/1841169736702574851?s=46

@twilio pov u are second fiddle to @altryne and @simonw live blogging and tweeting

https://x.com/simonw/status/1841169736702574851?s=46

@twilio @altryne @simonw

@twilio @altryne @simonw openai prompt caching is not as big a discount as Gemini and Anthropic. but works WITHOUT CODE CHANGES. lets see how long they cache… devil in details.

@twilio @altryne @simonw OAI Model Distillation suite!

a bunch of evals and finetuning startups just died

red wedding lives

@twilio @altryne @simonw exclusive interview with inexplicably photogenic strawberry man coming on @latentspacepod

@twilio @altryne @simonw @latentspacepod a brief history of @openai

https://x.com/gregkamradt/status/1841187546912735248?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI structured output recap

https://x.com/gregkamradt/status/1841187546912735248?s=46

https://twitter.com/adhsu/status/1841172452938039536

@twilio @altryne @simonw @latentspacepod @OpenAI public feature

https://twitter.com/adhsu/status/1841172452938039536

@twilio @altryne @simonw @latentspacepod @OpenAI model distillation session.

this evals product has one killer feature:

share data with openai for free inference

!!

@twilio @altryne @simonw @latentspacepod @OpenAI then the Distil product (basically upgraded finetuning ui)

its basically a few clicks and 10 mins to downshift from 4o to 4o mini and run 15x cheaper

on real usecase from @Superhuman

https://x.com/altryne/status/1841183253484781936?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman notebooklm put us out of a job lmao

https://x.com/altryne/status/1841183253484781936?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman correction: @sama is doing Q&A to end todays show!! send questions please

https://twitter.com/swyx/status/1841205085369745838

important correction: sama is closing the show, just not opening. submit actually good questions pls

https://twitter.com/swyx/status/1841205085369745838

https://x.com/keithwhor/status/1841177601446199419?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama realtime api demo starter from openai

https://x.com/keithwhor/status/1841177601446199419?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama realtime api workshop

VAD is done server side, and you have to wire up interrupt code yourself.

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama impressive demo of voice tool calling

https://x.com/minimaxir/status/1841190025280831705?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama hidden audio feature

https://x.com/minimaxir/status/1841190025280831705?s=46

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama a o1 session with @hwchung27 and @_jasonwei

lots of cameras recording so just search around for video

- what just became possible with o1?
- what will become possible with future versions of o1?
- what would you want to build if reasoning is 50% better?
- what would you NOT want to build?

. @_jasonwei takes the stage: when to use o1 preview/main vs o1 mini.

mini: math, koding
big: finding inaccuracies in dataset, hard sciences research, legal domain reasoning

q&a: CoT using RL - scaling inference compute. RL focused on backtracking, error correction.

https://twitter.com/openaidevs/status/1841175537060102396

@_jasonwei some related links

https://twitter.com/openaidevs/status/1841175537060102396

ok here’s @sama and @kevinweil!

Q: How close are we to AGI

“We clearly got to level 2 with o1.”

delta from 4turbo last devday to o1 is a lot
Next 2 years will accelerate very fast
AGI will be smooth exponential, no hard and clear milestone. No one cared when Turing test was crossed, historians will look back and disagree.

Q: is oai still committed to research?

yes more than ever
there was a time when all we did was scale up research
and other companies copying oai is fine
but when we're trying to do net new things in the world that is still very impt to sama

oai will continue to marry research and product tho

Q: oai only paying lip service to alignment?

sama:
- we have a diff take on alignment vs lesswrong
- we care a lot about building safe systems
- we want to make capable models that make it safer and safer over time
- o1 is obviously our most capable model but also our most aligned model
- we have to build models that are safe and robust to be generally accepted in the world
- scifi safety also impt.

Q: how do agents fit into longterm oai plans?

sama:
- chat is great but when u can think for equivalent of multiple days of human effort...
- people say things about agents now but they arent serious. this will be a VERY significant change to the way the world works
- we will ask agents to work on things for a month, multuiple of them, and in 2030 we will take this for granted.

Q: hurdles for ai controlling computer?

sama: safety and alignment

Q: can safety have false positives and limit access to ai?

sama: yes it will happen. we could have launched o1 without but it would come at a cost.

by the time of o3... itll work. if you try to get it to say something naughty it should follow your instructions.

we start on conservative side, then loosen up.

Q: what should startups that use ai as core feature do?

sama:
- ai doesnt excuse you from any of the normal laws of business.

Q: voice taps directly into human experience. ethics?

sama:
- i say please and thank you to chatgpt. you never know.

kevin:
- o1 will support fn calling, system prompts, etc before EOY

sama:
- model will get so much better so far. o1 is gpt2 scale, we know how to get it to gpt4
- plan for the model to get rapidly faster

https://x.com/swyx/status/1840867798308045219

Q: what feature or capabillity of a competitor do you admire?

sama: notebooklm. very well done. not enough people are shipping new things.

kevin: anthropic did a really good job on projects. gpts meant for persistent reuse, projects more ephemeral, mental model works

https://x.com/swyx/status/1840867798308045219

sama q to audience: who thinks theyre smarter than o1?

(some raised hands)

do you think you'll still think this by o2?

(nervous laughs)

- sama wants voice mode to sing. just being consevative.
- kevin had full business conversation in korea w chatgpt. interesting tension btwn chatgpt and speak/duolingo.

another q:
- sama: long context 10m, 10 trillion will be within the decade

WHY IS NOBODY SERVING UP THE SOFTBALL ABOUT THE 7% EQUITY STAKE

a wild gdb appeared!

https://twitter.com/gregkamradt/status/1841245242282144007

other stuff i missed

https://twitter.com/gregkamradt/status/1841245242282144007

https://x.com/AndrewCurran_/status/1841127941104964072

i think basically everybody at devday missed that microsoft shipped voice AND VISION in copilot today

i got excited, but tried it tho and it noticeably sucks vs openai

https://x.com/AndrewCurran_/status/1841127941104964072

https://x.com/Smol_AI/status/1841357072077844857

full recap out in @smol_ai news!

https://x.com/Smol_AI/status/1841357072077844857

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @swyx

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Apr 14

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for:

- telling the model to be persistent (+20%)
- dont self-inject/parse toolcalls (+2%)
- prompted planning (+4%)
- JSON BAD - use XML or arxiv 2406.13121 (GDM format)
- put instructions + user query at TOP -and- BOTTOM - bottom-only is VERY BAD
- no evidence that ALL CAPS or Bribes or Tips or threats to grandma work

cookbook.openai.com/examples/gpt4-…

we're publishing our GPT 4.1 chat with @michpokrass later today pls like n sub youtube.com/@latentspacepod

Read 7 tweets

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Mar 30

.swyxrules for working on long running/expensive code:

Unbundle run() into:

- init()
- continue(id)
- cleanup(id)

never assume you will call these in order

Always checkpoint and resume from id's.

Pass nothing else.

This forces you to keep things serializable and therefore loggable, reproducible, parallelizable.

by the way try not to name them 'id' if you can add extra detail like 'runId', 'taskId', 'subTask2Id'.

Read 5 tweets

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Jan 12

was fun to work on this @latentspacepod post w/ @benhylak

inspired by "pivot token" literature, one gpt->o1 mental model shift i've made is the role of self-evaluation and -correction as an ESSENTIAL part of planning/reasoning.

with o1, you move the LLM-as-judge *INTO THE PROMPT*, so you can let it handle the self eval and replanning. this is the incremental next "agentic" step, which openai consistently does well, to the frustration of more hyperbolic but snake oily alternatives.

@latentspacepod @benhylak full context latent.space/p/o1-skill-iss…

and here

x.com/benhylak/statu…

https://twitter.com/IntuitMachine/status/1879292531273572795

@latentspacepod @benhylak shamelessly ripped off lmao

https://twitter.com/IntuitMachine/status/1879292531273572795

Read 5 tweets

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Dec 14, 2024

this neurips is really going to be remembered as the "end of pretraining" neurips

notes from doctor @polynoamial's talk on scaling test time compute today

(thank you @oh_that_hat for organizing)

all gains to date have been from scaling data and pretrain compute and yet LLMs cant solve simple problems like tictactoe

however inference costs have scaled much less.

goes back to libratus/pluribus work

poker model scaling from 2012-2015 - scaled 5x each year, but still lost dramatically (9 big bets per hundred) to poker pros in 80k hands

recalls familiar insight about humans taking longer to think for harder problems.

added 20s of search - reduced distance from nash equilibrium results reduced by a factor of 7 - roughly the equivalent of scaling up model size by 100,000x

Read 22 tweets

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Sep 30, 2024

just realized NotebookLM is @GoogleDeepMind's ChatGPT moment

- "low key research preview"/"experimental"
- not monetized
- GPUs/TPUs immediately on fire
- SOTA proprietary new model buried in there with upgrade that weren't previously announced
- new AI UX that cleverly embeds LLM usage natively within the product features

in this case NBLM nailed multimodal RAG and I/O in a way that @ChatGPTapp never did (or for that matter, @GeminiApp). The multiple rounds of preprocessing described by @stevenbjohnson also raise the quality of the audio conversation dramatically at the cost of extreme latency (took an efficient model that was advertised as capable of generating 30s of audio in 0.5s, and slapped on like 200s of LLM latency haha)

https://x.com/latentspacepod/status/1840809425994727630

@GoogleDeepMind like, i put my podcast into it and it made a podcast of my podcast and... it was good.

do u guys know we spend 1-2 hrs writing up the show notes and now its a button press in NBLM

https://x.com/latentspacepod/status/1840809425994727630

@GoogleDeepMind - just hired @OfficialLoganK

if i had a penny for every time this has happened i'd have two pennies

Read 5 tweets

swyx 🔜 @aiDotEngineer (Jun 3-5)

@swyx

Sep 18, 2024

Gemini really took pride topping @lmsysorg for a hot second and then @OpenAI said "oh no u dont" and put out 4 straight bangers pounding everyone into the dust by 50 elo points

V high bar set for Gemini 2, Grok 2.5, and Claude 4 this fall.

Multiple fronts - on reasoning, multiturn chat tuning, instruction following, and coding - to compete.

anyway we finally did a @latentspacepod paper club on STaR and friends, swim on by

i hastily sketched out a "paper stack" of what the "literature of reasoning" could look like, but this is amateur work - would love @teortaxesTex or @arattml to map out a full list of likely relevant papers for o1

holy shit @ideogram_ai thumbnails are untapped alpha

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

swyx 🔜 @aiDotEngineer (Jun 3-5)

Try unrolling a thread yourself!

More from @swyx

swyx 🔜 @aiDotEngineer (Jun 3-5)

swyx 🔜 @aiDotEngineer (Jun 3-5)

swyx 🔜 @aiDotEngineer (Jun 3-5)

swyx 🔜 @aiDotEngineer (Jun 3-5)

swyx 🔜 @aiDotEngineer (Jun 3-5)

swyx 🔜 @aiDotEngineer (Jun 3-5)

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!