swyx Profile picture
anti-ego ideas for anti-ergodic life - @smol_ai - @aidotengineer - @latentspacepod - @coding_career - @dxtipshq
16 subscribers
Dec 14 22 tweets 11 min read
this neurips is really going to be remembered as the "end of pretraining" neurips

notes from doctor @polynoamial's talk on scaling test time compute today

(thank you @oh_that_hat for organizing) Image
Image
Image
all gains to date have been from scaling data and pretrain compute and yet LLMs cant solve simple problems like tictactoe

however inference costs have scaled much less. Image
Oct 1 43 tweets 17 min read
Here’s my @OpenAIDevs day thread for those following along. everyone else gotchu with videos and stuff so i will just give personal notes and aha moments thru the day

first observation: @sama MIA

GPT5 still mentioned and on the table



Image
Image
Image
Image
after some nice screenshot of Cocounsel, time for @romainhuet’s legendary live demos. o1 one-shots an ios app and does the frotnend/backend to control a drone.

ai controlled drones, what could go wrong?


Image
Image
Image
Sep 30 5 tweets 2 min read
just realized NotebookLM is @GoogleDeepMind's ChatGPT moment

- "low key research preview"/"experimental"
- not monetized
- GPUs/TPUs immediately on fire
- SOTA proprietary new model buried in there with upgrade that weren't previously announced
- new AI UX that cleverly embeds LLM usage natively within the product features

in this case NBLM nailed multimodal RAG and I/O in a way that @ChatGPTapp never did (or for that matter, @GeminiApp). The multiple rounds of preprocessing described by @stevenbjohnson also raise the quality of the audio conversation dramatically at the cost of extreme latency (took an efficient model that was advertised as capable of generating 30s of audio in 0.5s, and slapped on like 200s of LLM latency haha)Image @GoogleDeepMind like, i put my podcast into it and it made a podcast of my podcast and... it was good.

do u guys know we spend 1-2 hrs writing up the show notes and now its a button press in NBLM

Sep 18 4 tweets 3 min read
Gemini really took pride topping @lmsysorg for a hot second and then @OpenAI said "oh no u dont" and put out 4 straight bangers pounding everyone into the dust by 50 elo points

V high bar set for Gemini 2, Grok 2.5, and Claude 4 this fall.

Multiple fronts - on reasoning, multiturn chat tuning, instruction following, and coding - to compete.Image
Image
Image
Image
anyway we finally did a @latentspacepod paper club on STaR and friends, swim on by



i hastily sketched out a "paper stack" of what the "literature of reasoning" could look like, but this is amateur work - would love @teortaxesTex or @arattml to map out a full list of likely relevant papers for o1Image
Sep 11 22 tweets 11 min read
**Frontier AI in your Hands**

my live notes from today’s @MistralAI summit ft Jensen Huang and @arthurmensch and crew here

thread emoji
Image
Image
first articulation of La Plateforme vision beyond just hosted mistral models

sounds alarmingly familiar tbh


Image
Image
Image
Image
Sep 9 6 tweets 3 min read
wow. Apple might just have fixed Siri.

and beat OpenAI to the first AI phone.

and commoditized OpenAI with Google.

and casually dropped a video understanding model.

incredibly well executed.

(see @smol_ai writeup below for deltas from WWDC)
notable reveals from today's iphone 16 event, especially Apple Visual Intelligence:

- Mail and Notifications will show summaries instead of str[:x]

- Siri now knows iPhone, becomes the ultimate manual on how to use the increasingly complicated iOS 18

and can read your texts (!) to suggest actions with Personal Context Understanding

(also it will try to advertise apple tv shows to you... i'm SURE it will be totally objective and aligned to your preferences amirite)

- new iphone 16 camera control button is PRIME real estate - notice how OpenAI/ChatGPT is now next to Google search, and both are secondary clicks to Apple's visual search, which comes first

- camera adds events to calendar!

"all done on device" and on cloud (though craig doesnt say that haha)

insanely good ideas on ai + phone integrations.Image
Image
Image
Image
Jul 23 8 tweets 5 min read
Llama 3: the Synthetic Data model

Llama 3 paper is finally out! by @lvdmaaten and Angela Fan. Quick diffs from yesterday's leaks (+ watch our exclusive @ThomasScialom interview out now!)

- NEW SCALING LAWS! turns out there's a reason why they trained a 405B param model because they had 15T tokens

- full weight class benchmarks table vs Gemma, Mistral, 4o/sonnet! no surprises - 8B and 70B are strongest here, but 405B has solid IFEval and Tool Use
- Multimodal encoder, Vision and Speech Adapter coming
- 15T token data pipeline uses Llama 2 cleaning/filtering, and Deepseek v2 pipelines for code and math!

some pretty fun notes on infra and training - together with full details on learning rates and training recipe.Image
Image
Image
Image
this is going to make @Teknium1 happy - 3 approaches for syndata explored, apart form the obvious 8B/70B distillation

- 405B teaching itself with code execution feedback

- translating code data to smaller programming languages (like TypeScript and PHP??? this is slander)

- "backtranslation" - 1.2m synthetic dialogs going from documentation/explanations to code, then using LLM as judge to filter (pretty smart!)

For math: let's verify step by step :)
Image
Image
Image
Image
Jun 10 4 tweets 3 min read
a lot of people recapping the WWDC keynote but are any Apple engineers on here sharing insights and behind the scenes?

Apple Intelligence is going to be the largest deployment of tool using AI and i’d like someone to speak at @aidotengineer on the design considerations!

free tix for anyone who introduces an Apple speaker for us!Image
Image
Image
Image
most detail so far
Image
Nov 6, 2023 23 tweets 10 min read
Join @latentspacepod and @thursdai_pod live at DevDay!

Now:

spotted: “New Products Deep Dive” for 45 mins… I wonder what that will be twitter.com/i/spaces/1BRJj…

GPT4 Turbo is ~3x cheaper than GPT4!

1. OpenAI's longest ever Context length: 128k
2. Better JSON/function calling
3. Knowledge: built in RAG and April 2023 cutoff
4. Dalle3, GPT4-V, and TTS model all in API today!!!
4b. Whisper V3 open sourced (coming to API)
5. Customization: GPT3 16k, GPT4 finetuning, Custom Models services
6. Higher Rate Limits - 2x tokens per minute, request raises in account settings - plus: Copyright Shield!

"GPT4 Turbo is a smarter model than GPT4" (GPT4.5 confirmed!)


Image
Image
Image
Oct 10, 2023 17 tweets 11 min read
it’s official - I think GitHub Copilot is the first* generative AI product to publicly claim they’ve passed $100m ARR — enough to stand alone as a publicly listed company

Whenever people ask me “is AI a fad” the biggest thing I point to is “follow the money”:

- revenue, not just funding
- RECURRING, not tcosts on hype
- people publicly saying they’d pay 5x the cost

(*there’s likely a few others but none confirmed officially - see Anatomy of Autonomy post on @latentspacepod)
Image next up is @DedyKredo LIVE CODING a full test suite, making code changes, and automating commit and PR review, all assisted by @CodiumAI . audible “what the fuck” from @eugeneyan.



ends with a powerful message for Israel. we stand with you @itamar_mar. youtube.com/live/qw4PrtyvJ…
Jul 18, 2023 17 tweets 12 min read
That was fast - Llama 2 is out!

and cleared for commercial use! and *destroys* Falcon 40B on @DanHendrycks's MMLU and other top benchmarks

They really meant it when they said "imminently" lol



Scheduled a @latentspacepod at 3pm PT - join @FanaHOVA and… https://t.co/iWFLYJLCJd https://t.co/C0YKJ8snjr https://t.co/TZvfRrz5lKtwitter.com/i/spaces/1nAKE…
twitter.com/i/web/status/1…



Image
Image
Image
@DanHendrycks @latentspacepod @FanaHOVA it seems @mascobot is on top of it - you can try out llama 2 here:

they also have a Llama playground but its not currently working for me https://t.co/cao0EUYWQSreplicate.com/a16z-infra/lla…
Jun 30, 2023 5 tweets 2 min read
🆕 Essay: The Rise of the AI Engineer



Keeping up on AI is becoming a full time job.

Let's get together and define it. https://t.co/KD2lY9FTtmlatent.space/p/ai-engineer
Builders need a place to talk turpentine. This is why i'm teaming up with @benghamine to produce @aiDotEngineer, the definitive place to talk AI UX, devtools, infra, and all things AI Engineering.

500 seats.
SF/Virtual, Oct 8-10.

Join us!

Jun 20, 2023 6 tweets 6 min read
The @latentspacepod is excited to publish:

Petaflops to the People:
@realGeorgeHotz's first interview
on his new personal compute cluster company

the tiny corp.

latent.space/p/geohot

We discuss how tiny is taking on Nvidia, Google, and PyTorch with a tiny team and go deep… twitter.com/i/web/status/1… @latentspacepod @realGeorgeHotz GPT4 is 8 x 220B params = 1.7 Trillion params



ok I wasn't sure how widely to spread the rumors on GPT-4 but it seems Soumith is also confirming the same so here's the quick clip!

so yes, GPT4 is technically 10x the size of GPT3, and all the small… twitter.com/i/web/status/1…
Jun 7, 2023 5 tweets 7 min read
this is a trend I'm calling "Code is all you need"

Comparing Bard vs @OpenAI ChatGPT vs @AnthropicAI Claude on Google's own reasoning/math prompts shows the stark contrast once you make your model write and eval code to answer questions. Reminds me of @amasad and @goodside's… twitter.com/i/web/status/1… ImageImageImage @OpenAI @AnthropicAI @amasad @goodside This is part of a broader trend of us slowly discovering the special place of code in language models:

1/ Code Improves LLMs
@Francis_YAO_ et al have repeatedly found that adding code in pretraining data improves LLMs in all benchmarks ( )

2/ Code LLMs… twitter.com/i/web/status/1…
May 14, 2023 5 tweets 4 min read
Stop building the thing.
Build the thing that builds all the things.

IMO the most important thing every developer could be doing right now on nights and weekends is building a general purpose personal junior dev agent they can control and trust, that they can scale to fleets.… twitter.com/i/web/status/1… Image first thing Tony ever built wasn't a flying suit of armor, fancy weapons, or mini fusion reactor

he built the thing that builds the things (and saves his life when the other stuff fails)
Apr 25, 2023 4 tweets 4 min read
.@Replit just announced their own LLaMa style code LLM at their developer day!

replit-code-v1-3b

- 2.7b params
- 20 languages
- 525B tokens (“20x Chinchilla?”)
- beats all open source code models on HumanEval benchmark
- trained in 10 days with @NaveenGRao @MosaicML ImageImage and @amasad follows up with a finetuned version - replit-finetune-v1-3b - using @Replit data - and this catapults Replits model *ahead* of @OpenAI codex 🤯

they are matching the performance of >10B LLMs with way smoller 2.7B models

and it will be open source/freely licensed! Image
Apr 23, 2023 4 tweets 3 min read
I love seeing the birth of a new social network. unsure about its future but its cool that in early days it’s still smol enough you can hold the world “map” in your head and zoom in to see individual people

the internet was a nicer place when it was a neighborhood and not a mob ImageImageImageImage everyone out here tweeting bsky fomo, i'm in here making @chirperai bots, we are not the same Image
Apr 19, 2023 16 tweets 10 min read
🧠 The Anatomy of Autonomy 🤖

The fifth killer app of AI is Autonomous Agents.

Presenting
- Summary of #AutoGPT / @babyAGI_
- The 5 stages of "brain" development it took to get from Foundation Models to Autonomous Agents
- Why Full Autonomy is like "Full Self Driving"!

Begin: Image @babyAGI_ (this is the obligatory threadooor TLDR of my latest newsletter post, hop over if you like my long form work: )latent.space/p/agents
Apr 19, 2023 4 tweets 4 min read
Writing my recap / thoughts on AI Agent mania today for the newsletter.

- if you've used @babyAGI_ / @AutoGpt for something interesting: what's a good usecase?

- if you're highly skeptical: why?

- if you want to see more: elaborate? using chatgpt to rip apart @yoheinakajima's code haha

this feels like cheating. i cant look at any new codebase without this visualization again (cc @ShaneaLeven or @danlovesproofs maybe already has a smarter take on this) ImageImageImageImage
Mar 27, 2023 4 tweets 3 min read
Incredible how Stephen Wolfram toiled away in relative obscurity for ~15 years, only to wake up one day and find that Wolfram|Alpha is literally the perfect bridge from agentic AI to real world knowledge, errors included.

“You can't connect the dots looking forward; you can only… twitter.com/i/web/status/1… ImageImageImageImage if you had asked me in January how long it would take us to blend symbolic ai and generative ai i would have said 5 years… took 10 lines of json with the new chatgpt plugins system
Mar 23, 2023 8 tweets 6 min read
ChatGPT casually dropped an APP STORE 🤯

It can now:
- browse the web (RIP Bing waitlist, cutoff)
- write and run Python (RIP replit?)
- access org info (RIP docsearch startups)
- add third party plugins from OpenTable, Wolfram, Instacart, Zapier, etc)
- developer SDK in preview When I talked about the AI Red Wedding last year () I was talking about AI offerings undercutting existing human-based or manual business processes.

Now the AI Red Wedding is coming for companies building atop foundation model companies.

@OpenAI is… twitter.com/i/web/status/1…