Latest Twitter Threads by @swyx on Thread Reader App

Apr 14 • 7 tweets • 3 min read

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for:

- telling the model to be persistent (+20%)
- dont self-inject/parse toolcalls (+2%)
- prompted planning (+4%)
- JSON BAD - use XML or arxiv 2406.13121 (GDM format)
- put instructions + user query at TOP -and- BOTTOM - bottom-only is VERY BAD
- no evidence that ALL CAPS or Bribes or Tips or threats to grandma work

cookbook.openai.com/examples/gpt4-…

Mar 30 • 5 tweets • 1 min read

.swyxrules for working on long running/expensive code: Unbundle run() into:

- init()
- continue(id)
- cleanup(id)

never assume you will call these in order

Jan 12 • 5 tweets • 2 min read

was fun to work on this @latentspacepod post w/ @benhylak

inspired by "pivot token" literature, one gpt->o1 mental model shift i've made is the role of self-evaluation and -correction as an ESSENTIAL part of planning/reasoning.

with o1, you move the LLM-as-judge *INTO THE PROMPT*, so you can let it handle the self eval and replanning. this is the incremental next "agentic" step, which openai consistently does well, to the frustration of more hyperbolic but snake oily alternatives.

@latentspacepod @benhylak full context latent.space/p/o1-skill-iss…

and here

x.com/benhylak/statu…

Dec 14, 2024 • 22 tweets • 11 min read

this neurips is really going to be remembered as the "end of pretraining" neurips

notes from doctor @polynoamial's talk on scaling test time compute today

(thank you @oh_that_hat for organizing)

all gains to date have been from scaling data and pretrain compute and yet LLMs cant solve simple problems like tictactoe

however inference costs have scaled much less.

Oct 1, 2024 • 43 tweets • 17 min read

Here’s my @OpenAIDevs day thread for those following along. everyone else gotchu with videos and stuff so i will just give personal notes and aha moments thru the day

first observation: @sama MIA

GPT5 still mentioned and on the table

https://twitter.com/romainhuet/status/1841161854217044235

after some nice screenshot of Cocounsel, time for @romainhuet’s legendary live demos. o1 one-shots an ios app and does the frotnend/backend to control a drone.

ai controlled drones, what could go wrong?

Sep 30, 2024 • 5 tweets • 2 min read

just realized NotebookLM is @GoogleDeepMind's ChatGPT moment

- "low key research preview"/"experimental"
- not monetized
- GPUs/TPUs immediately on fire
- SOTA proprietary new model buried in there with upgrade that weren't previously announced
- new AI UX that cleverly embeds LLM usage natively within the product features

in this case NBLM nailed multimodal RAG and I/O in a way that @ChatGPTapp never did (or for that matter, @GeminiApp). The multiple rounds of preprocessing described by @stevenbjohnson also raise the quality of the audio conversation dramatically at the cost of extreme latency (took an efficient model that was advertised as capable of generating 30s of audio in 0.5s, and slapped on like 200s of LLM latency haha)

@GoogleDeepMind like, i put my podcast into it and it made a podcast of my podcast and... it was good.

do u guys know we spend 1-2 hrs writing up the show notes and now its a button press in NBLM

https://x.com/latentspacepod/status/1840809425994727630

Sep 18, 2024 • 4 tweets • 3 min read

Gemini really took pride topping @lmsysorg for a hot second and then @OpenAI said "oh no u dont" and put out 4 straight bangers pounding everyone into the dust by 50 elo points

V high bar set for Gemini 2, Grok 2.5, and Claude 4 this fall.

Multiple fronts - on reasoning, multiturn chat tuning, instruction following, and coding - to compete.

anyway we finally did a @latentspacepod paper club on STaR and friends, swim on by

i hastily sketched out a "paper stack" of what the "literature of reasoning" could look like, but this is amateur work - would love @teortaxesTex or @arattml to map out a full list of likely relevant papers for o1

Sep 11, 2024 • 22 tweets • 11 min read

**Frontier AI in your Hands**

my live notes from today’s @MistralAI summit ft Jensen Huang and @arthurmensch and crew here

thread emoji

first articulation of La Plateforme vision beyond just hosted mistral models

sounds alarmingly familiar tbh

Sep 9, 2024 • 6 tweets • 3 min read

wow. Apple might just have fixed Siri.

and beat OpenAI to the first AI phone.

and commoditized OpenAI with Google.

and casually dropped a video understanding model.

incredibly well executed.

(see @smol_ai writeup below for deltas from WWDC)

https://twitter.com/swyx/status/1800242529545445734

notable reveals from today's iphone 16 event, especially Apple Visual Intelligence:

- Mail and Notifications will show summaries instead of str[:x]

- Siri now knows iPhone, becomes the ultimate manual on how to use the increasingly complicated iOS 18

and can read your texts (!) to suggest actions with Personal Context Understanding

(also it will try to advertise apple tv shows to you... i'm SURE it will be totally objective and aligned to your preferences amirite)

- new iphone 16 camera control button is PRIME real estate - notice how OpenAI/ChatGPT is now next to Google search, and both are secondary clicks to Apple's visual search, which comes first

- camera adds events to calendar!

"all done on device" and on cloud (though craig doesnt say that haha)

insanely good ideas on ai + phone integrations.

Jul 23, 2024 • 8 tweets • 5 min read

Llama 3: the Synthetic Data model

Llama 3 paper is finally out! by @lvdmaaten and Angela Fan. Quick diffs from yesterday's leaks (+ watch our exclusive @ThomasScialom interview out now!)

- NEW SCALING LAWS! turns out there's a reason why they trained a 405B param model because they had 15T tokens

- full weight class benchmarks table vs Gemma, Mistral, 4o/sonnet! no surprises - 8B and 70B are strongest here, but 405B has solid IFEval and Tool Use
- Multimodal encoder, Vision and Speech Adapter coming
- 15T token data pipeline uses Llama 2 cleaning/filtering, and Deepseek v2 pipelines for code and math!

some pretty fun notes on infra and training - together with full details on learning rates and training recipe.

this is going to make @Teknium1 happy - 3 approaches for syndata explored, apart form the obvious 8B/70B distillation

- 405B teaching itself with code execution feedback

- translating code data to smaller programming languages (like TypeScript and PHP??? this is slander)

- "backtranslation" - 1.2m synthetic dialogs going from documentation/explanations to code, then using LLM as judge to filter (pretty smart!)

For math: let's verify step by step :)

https://x.com/latentspacepod/status/1795196817044594817

Jun 10, 2024 • 4 tweets • 3 min read

a lot of people recapping the WWDC keynote but are any Apple engineers on here sharing insights and behind the scenes?

Apple Intelligence is going to be the largest deployment of tool using AI and i’d like someone to speak at @aidotengineer on the design considerations!

free tix for anyone who introduces an Apple speaker for us!

most detail so far

https://x.com/maxwinebach/status/1800277157135909005?s=46&t=90xQ8sGy63D2OtiaoGJuww

Nov 6, 2023 • 23 tweets • 10 min read

Join @latentspacepod and @thursdai_pod live at DevDay!

Now:

spotted: “New Products Deep Dive” for 45 mins… I wonder what that will be twitter.com/i/spaces/1BRJj…

https://twitter.com/thursdai_pod/status/1721565658566267257

GPT4 Turbo is ~3x cheaper than GPT4!

1. OpenAI's longest ever Context length: 128k
2. Better JSON/function calling
3. Knowledge: built in RAG and April 2023 cutoff
4. Dalle3, GPT4-V, and TTS model all in API today!!!
4b. Whisper V3 open sourced (coming to API)
5. Customization: GPT3 16k, GPT4 finetuning, Custom Models services
6. Higher Rate Limits - 2x tokens per minute, request raises in account settings - plus: Copyright Shield!

"GPT4 Turbo is a smarter model than GPT4" (GPT4.5 confirmed!)

Oct 10, 2023 • 17 tweets • 11 min read

it’s official - I think GitHub Copilot is the first* generative AI product to publicly claim they’ve passed $100m ARR — enough to stand alone as a publicly listed company

Whenever people ask me “is AI a fad” the biggest thing I point to is “follow the money”:

- revenue, not just funding
- RECURRING, not tcosts on hype
- people publicly saying they’d pay 5x the cost

(*there’s likely a few others but none confirmed officially - see Anatomy of Autonomy post on @latentspacepod)

next up is @DedyKredo LIVE CODING a full test suite, making code changes, and automating commit and PR review, all assisted by @CodiumAI . audible “what the fuck” from @eugeneyan.

ends with a powerful message for Israel. we stand with you @itamar_mar. youtube.com/live/qw4PrtyvJ…

Jul 18, 2023 • 17 tweets • 12 min read

That was fast - Llama 2 is out!

and cleared for commercial use! and *destroys* Falcon 40B on @DanHendrycks's MMLU and other top benchmarks

They really meant it when they said "imminently" lol

Scheduled a @latentspacepod at 3pm PT - join @FanaHOVA and… https://t.co/iWFLYJLCJd https://t.co/C0YKJ8snjr https://t.co/TZvfRrz5lKtwitter.com/i/spaces/1nAKE…
twitter.com/i/web/status/1…

https://twitter.com/swyx/status/1679925361247911936

@DanHendrycks @latentspacepod @FanaHOVA it seems @mascobot is on top of it - you can try out llama 2 here:

they also have a Llama playground but its not currently working for me https://t.co/cao0EUYWQSreplicate.com/a16z-infra/lla…

https://twitter.com/Mascobot/status/1681334370584637441?s=20

Jun 30, 2023 • 5 tweets • 2 min read

🆕 Essay: The Rise of the AI Engineer

Keeping up on AI is becoming a full time job.

Let's get together and define it. https://t.co/KD2lY9FTtmlatent.space/p/ai-engineer

Builders need a place to talk turpentine. This is why i'm teaming up with @benghamine to produce @aiDotEngineer, the definitive place to talk AI UX, devtools, infra, and all things AI Engineering.

500 seats.
SF/Virtual, Oct 8-10.

Join us!

https://twitter.com/aiDotEngineer/status/1674828116269891584

Jun 20, 2023 • 6 tweets • 6 min read

The @latentspacepod is excited to publish:

Petaflops to the People:
@realGeorgeHotz's first interview
on his new personal compute cluster company

the tiny corp.

latent.space/p/geohot

We discuss how tiny is taking on Nvidia, Google, and PyTorch with a tiny team and go deep… twitter.com/i/web/status/1… @latentspacepod @realGeorgeHotz GPT4 is 8 x 220B params = 1.7 Trillion params

https://twitter.com/soumithchintala/status/1671267150101721090?s=20

ok I wasn't sure how widely to spread the rumors on GPT-4 but it seems Soumith is also confirming the same so here's the quick clip!

so yes, GPT4 is technically 10x the size of GPT3, and all the small… twitter.com/i/web/status/1…

Jun 7, 2023 • 5 tweets • 7 min read

this is a trend I'm calling "Code is all you need"

Comparing Bard vs @OpenAI ChatGPT vs @AnthropicAI Claude on Google's own reasoning/math prompts shows the stark contrast once you make your model write and eval code to answer questions. Reminds me of @amasad and @goodside's… twitter.com/i/web/status/1…

@OpenAI @AnthropicAI @amasad @goodside This is part of a broader trend of us slowly discovering the special place of code in language models:

1/ Code Improves LLMs
@Francis_YAO_ et al have repeatedly found that adding code in pretraining data improves LLMs in all benchmarks (

https://twitter.com/Francis_YAO_/status/1657985409706762241

)

2/ Code LLMs… twitter.com/i/web/status/1…

May 14, 2023 • 5 tweets • 4 min read

Stop building the thing.
Build the thing that builds all the things.

IMO the most important thing every developer could be doing right now on nights and weekends is building a general purpose personal junior dev agent they can control and trust, that they can scale to fleets.… twitter.com/i/web/status/1…

first thing Tony ever built wasn't a flying suit of armor, fancy weapons, or mini fusion reactor

he built the thing that builds the things (and saves his life when the other stuff fails)

Apr 25, 2023 • 4 tweets • 4 min read

.@Replit just announced their own LLaMa style code LLM at their developer day!

replit-code-v1-3b

- 2.7b params
- 20 languages
- 525B tokens (“20x Chinchilla?”)
- beats all open source code models on HumanEval benchmark
- trained in 10 days with @NaveenGRao @MosaicML

and @amasad follows up with a finetuned version - replit-finetune-v1-3b - using @Replit data - and this catapults Replits model *ahead* of @OpenAI codex 🤯

they are matching the performance of >10B LLMs with way smoller 2.7B models

and it will be open source/freely licensed!

Apr 23, 2023 • 4 tweets • 3 min read

I love seeing the birth of a new social network. unsure about its future but its cool that in early days it’s still smol enough you can hold the world “map” in your head and zoom in to see individual people

the internet was a nicer place when it was a neighborhood and not a mob

everyone out here tweeting bsky fomo, i'm in here making @chirperai bots, we are not the same

Apr 19, 2023 • 16 tweets • 10 min read

🧠 The Anatomy of Autonomy 🤖

The fifth killer app of AI is Autonomous Agents.

Presenting
- Summary of #AutoGPT / @babyAGI_
- The 5 stages of "brain" development it took to get from Foundation Models to Autonomous Agents
- Why Full Autonomy is like "Full Self Driving"!

Begin:

@babyAGI_ (this is the obligatory threadooor TLDR of my latest newsletter post, hop over if you like my long form work: )latent.space/p/agents

Share this page!

Enter URL or ID to Unroll