swyx 🇸🇬 ICLR Singapore Profile picture
Oct 1, 2024 43 tweets 17 min read Read on X
Here’s my @OpenAIDevs day thread for those following along. everyone else gotchu with videos and stuff so i will just give personal notes and aha moments thru the day

first observation: @sama MIA

GPT5 still mentioned and on the table



Image
Image
Image
Image
after some nice screenshot of Cocounsel, time for @romainhuet’s legendary live demos. o1 one-shots an ios app and does the frotnend/backend to control a drone.

ai controlled drones, what could go wrong?


Image
Image
Image
@romainhuet Realtime API announced!

starting with speech to speech support
all 6 adv voice mode voices supported

demo next Image
@romainhuet Realtime voice mode in playground now

playground shows event logs for u to react to

playground now has autoprompting that also generates fewshot examples and function calling schemas


Image
Image
Image
Image
voice mode has function calling and is weirdly obsessed with strawberrries

he is integrating with @twilio api and ordering strawberries for all ofnus! classic twilo demo
Image
@twilio realtime api uses 4o as backbone and is public beta starting today Image
@twilio pov u are second fiddle to @altryne and @simonw live blogging and tweeting


Image
@twilio @altryne @simonw


Image
Image
Image
Image
@twilio @altryne @simonw openai prompt caching is not as big a discount as Gemini and Anthropic. but works WITHOUT CODE CHANGES. lets see how long they cache… devil in details.
@twilio @altryne @simonw OAI Model Distillation suite!

a bunch of evals and finetuning startups just died

red wedding lives

Image
Image
Image
@twilio @altryne @simonw exclusive interview with inexplicably photogenic strawberry man coming on @latentspacepod Image
@twilio @altryne @simonw @latentspacepod a brief history of @openai
@twilio @altryne @simonw @latentspacepod @OpenAI structured output recap
@twilio @altryne @simonw @latentspacepod @OpenAI public feature
@twilio @altryne @simonw @latentspacepod @OpenAI model distillation session.

this evals product has one killer feature:

share data with openai for free inference

!!
@twilio @altryne @simonw @latentspacepod @OpenAI then the Distil product (basically upgraded finetuning ui)

its basically a few clicks and 10 mins to downshift from 4o to 4o mini and run 15x cheaper

on real usecase from @Superhuman
@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman notebooklm put us out of a job lmao
@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman correction: @sama is doing Q&A to end todays show!! send questions please
Image
Image
important correction: sama is closing the show, just not opening. submit actually good questions pls
@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama realtime api demo starter from openai

@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama realtime api workshop

VAD is done server side, and you have to wire up interrupt code yourself.


Image
Image
Image
Image
@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama impressive demo of voice tool calling
@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama hidden audio feature
@twilio @altryne @simonw @latentspacepod @OpenAI @Superhuman @sama a o1 session with @hwchung27 and @_jasonwei

lots of cameras recording so just search around for video
Image
Image
- what just became possible with o1?
- what will become possible with future versions of o1?
- what would you want to build if reasoning is 50% better?
- what would you NOT want to build?
. @_jasonwei takes the stage: when to use o1 preview/main vs o1 mini.

mini: math, koding
big: finding inaccuracies in dataset, hard sciences research, legal domain reasoning

q&a: CoT using RL - scaling inference compute. RL focused on backtracking, error correction.

Image
Image
Image
ok here’s @sama and @kevinweil!

Q: How close are we to AGI

“We clearly got to level 2 with o1.”

delta from 4turbo last devday to o1 is a lot
Next 2 years will accelerate very fast
AGI will be smooth exponential, no hard and clear milestone. No one cared when Turing test was crossed, historians will look back and disagree.Image
Image
Image
Image
Q: is oai still committed to research?

yes more than ever
there was a time when all we did was scale up research
and other companies copying oai is fine
but when we're trying to do net new things in the world that is still very impt to sama

oai will continue to marry research and product tho
Q: oai only paying lip service to alignment?

sama:
- we have a diff take on alignment vs lesswrong
- we care a lot about building safe systems
- we want to make capable models that make it safer and safer over time
- o1 is obviously our most capable model but also our most aligned model
- we have to build models that are safe and robust to be generally accepted in the world
- scifi safety also impt.
Q: how do agents fit into longterm oai plans?

sama:
- chat is great but when u can think for equivalent of multiple days of human effort...
- people say things about agents now but they arent serious. this will be a VERY significant change to the way the world works
- we will ask agents to work on things for a month, multuiple of them, and in 2030 we will take this for granted.
Q: hurdles for ai controlling computer?

sama: safety and alignment
Q: can safety have false positives and limit access to ai?

sama: yes it will happen. we could have launched o1 without but it would come at a cost.

by the time of o3... itll work. if you try to get it to say something naughty it should follow your instructions.

we start on conservative side, then loosen up.
Q: what should startups that use ai as core feature do?

sama:
- ai doesnt excuse you from any of the normal laws of business.
Q: voice taps directly into human experience. ethics?

sama:
- i say please and thank you to chatgpt. you never know.
kevin:
- o1 will support fn calling, system prompts, etc before EOY

sama:
- model will get so much better so far. o1 is gpt2 scale, we know how to get it to gpt4
- plan for the model to get rapidly faster
Q: what feature or capabillity of a competitor do you admire?

sama: notebooklm. very well done. not enough people are shipping new things.


kevin: anthropic did a really good job on projects. gpts meant for persistent reuse, projects more ephemeral, mental model works
sama q to audience: who thinks theyre smarter than o1?

(some raised hands)

do you think you'll still think this by o2?

(nervous laughs)
- sama wants voice mode to sing. just being consevative.
- kevin had full business conversation in korea w chatgpt. interesting tension btwn chatgpt and speak/duolingo.

another q:
- sama: long context 10m, 10 trillion will be within the decade

WHY IS NOBODY SERVING UP THE SOFTBALL ABOUT THE 7% EQUITY STAKE
a wild gdb appeared! Image
i think basically everybody at devday missed that microsoft shipped voice AND VISION in copilot today



i got excited, but tried it tho and it noticeably sucks vs openai


full recap out in @smol_ai news!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with swyx 🇸🇬 ICLR Singapore

swyx 🇸🇬 ICLR Singapore Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @swyx

Apr 14
don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for:

- telling the model to be persistent (+20%)
- dont self-inject/parse toolcalls (+2%)
- prompted planning (+4%)
- JSON BAD - use XML or arxiv 2406.13121 (GDM format)
- put instructions + user query at TOP -and- BOTTOM - bottom-only is VERY BAD
- no evidence that ALL CAPS or Bribes or Tips or threats to grandma workImage
Image
Image
Image
we're publishing our GPT 4.1 chat with @michpokrass later today pls like n sub youtube.com/@latentspacepod
Read 7 tweets
Mar 30
.swyxrules for working on long running/expensive code:
Unbundle run() into:

- init()
- continue(id)
- cleanup(id)

never assume you will call these in order
Always checkpoint and resume from id's.

Pass nothing else.

This forces you to keep things serializable and therefore loggable, reproducible, parallelizable.

by the way try not to name them 'id' if you can add extra detail like 'runId', 'taskId', 'subTask2Id'.
Read 5 tweets
Jan 12
was fun to work on this @latentspacepod post w/ @benhylak

inspired by "pivot token" literature, one gpt->o1 mental model shift i've made is the role of self-evaluation and -correction as an ESSENTIAL part of planning/reasoning.

with o1, you move the LLM-as-judge *INTO THE PROMPT*, so you can let it handle the self eval and replanning. this is the incremental next "agentic" step, which openai consistently does well, to the frustration of more hyperbolic but snake oily alternatives.Image
@latentspacepod @benhylak full context latent.space/p/o1-skill-iss…

and here

x.com/benhylak/statu…
@latentspacepod @benhylak shamelessly ripped off lmao
Read 5 tweets
Dec 14, 2024
this neurips is really going to be remembered as the "end of pretraining" neurips

notes from doctor @polynoamial's talk on scaling test time compute today

(thank you @oh_that_hat for organizing) Image
Image
Image
all gains to date have been from scaling data and pretrain compute and yet LLMs cant solve simple problems like tictactoe

however inference costs have scaled much less. Image
goes back to libratus/pluribus work

poker model scaling from 2012-2015 - scaled 5x each year, but still lost dramatically (9 big bets per hundred) to poker pros in 80k hands

recalls familiar insight about humans taking longer to think for harder problems.

added 20s of search - reduced distance from nash equilibrium results reduced by a factor of 7 - roughly the equivalent of scaling up model size by 100,000xImage
Image
Image
Image
Read 22 tweets
Sep 30, 2024
just realized NotebookLM is @GoogleDeepMind's ChatGPT moment

- "low key research preview"/"experimental"
- not monetized
- GPUs/TPUs immediately on fire
- SOTA proprietary new model buried in there with upgrade that weren't previously announced
- new AI UX that cleverly embeds LLM usage natively within the product features

in this case NBLM nailed multimodal RAG and I/O in a way that @ChatGPTapp never did (or for that matter, @GeminiApp). The multiple rounds of preprocessing described by @stevenbjohnson also raise the quality of the audio conversation dramatically at the cost of extreme latency (took an efficient model that was advertised as capable of generating 30s of audio in 0.5s, and slapped on like 200s of LLM latency haha)Image
@GoogleDeepMind like, i put my podcast into it and it made a podcast of my podcast and... it was good.

do u guys know we spend 1-2 hrs writing up the show notes and now its a button press in NBLM

@GoogleDeepMind - just hired @OfficialLoganK

if i had a penny for every time this has happened i'd have two pennies
Read 5 tweets
Sep 18, 2024
Gemini really took pride topping @lmsysorg for a hot second and then @OpenAI said "oh no u dont" and put out 4 straight bangers pounding everyone into the dust by 50 elo points

V high bar set for Gemini 2, Grok 2.5, and Claude 4 this fall.

Multiple fronts - on reasoning, multiturn chat tuning, instruction following, and coding - to compete.Image
Image
Image
Image
anyway we finally did a @latentspacepod paper club on STaR and friends, swim on by



i hastily sketched out a "paper stack" of what the "literature of reasoning" could look like, but this is amateur work - would love @teortaxesTex or @arattml to map out a full list of likely relevant papers for o1Image
holy shit @ideogram_ai thumbnails are untapped alpha Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(