Aidan McLaughlin Profile picture
research scientist @openai
Mar 4, 2025 5 tweets 1 min read
my trump headcannon is that he just doesn't grokk positive sum games

in his mind, for you to win, someone else has to lose for america to prosper, canada must suffer. for Trump Inc to profit, plebs must stupidly gamble their savings. for our market to expand, someone needs to find a mine of valubletronium
Feb 27, 2025 5 tweets 2 min read
welcome, gpt-4.5

i've spent a lot of time playing with this model recently, and it's left me feeling the agi

some thoughts Image two quick things outta the way:
>gpt-4.5 is not a reasoner
>gpt-4.5 is a low-key research preview

for intense math, code, or instruction-following, you should prob use o1/o3-mini

gpt-4.5 is not a benchmark-killer; there will be rough spots but we wanted to get this out asap
Dec 28, 2024 11 tweets 2 min read
you should basically pretend that getting a model to think for longer is the same as building a bigger model

following the math is quite fun and uncovers some neat things about industry progress gpt-2
incoherent

gpt-3.5-turbo
very fluent elementary schooler

gpt-4
crazy well-read middle-schooler
Dec 18, 2024 4 tweets 2 min read
o1 aidanbench results
it's the best model in the world Image aidanbench was at neurips; we re-wrote the entire benchmark and made new questions (still work to do!)

after lmsys, this is likely one of the most expensive benchmarks in the world

thanks to co-authors @jam3scampbell and @heyanuja

repo is public at github.com/aidanmclaughli…
Nov 8, 2024 6 tweets 1 min read
the only two models in my toolbox rn are:
>o1-preview (sota)
>gpt-4o-mini (pareto optimal)

they're the best, and sadly it's not even close imo

review thread —— o1-preview ——

this model leads by a mile on horsepower

i don't care what your coding bench says, o1-preview has insane big-model smell. it nails ood tasks better than opus or gpt-4-0314

it feels like a gpt-3.5 -> gpt-4 jump; it's notably more 'awake' and less hallucinated
Sep 19, 2024 5 tweets 2 min read
fact check: incorrect.

o1-mini is not better because it thinks longer

it’s just a better model

thread
Image openrouter released recent data on o1 tokens spent on reasoning vs response

on average, o1-preview actually reasons longer than o1-mini (despite having a shorter output limit) Image
Aug 6, 2024 5 tweets 2 min read
-- benchmark --

Aidan Bench measures creativity, reliability, attention, and instruction following.

>mistral large 2 wins by a lot???
>gpt-4o sucks confirmed
>sonnet-3.5 remains very strong
>gpt-4-0314 shows old man strength

github.com/aidanmclaughli…Image notably, Aidan Bench scores are largely uncorrelated with Lmsys scores.

>gpt-4o, gpt-4o-mini, and gemini-flash score quite well on lmsys but poorly on Aidan Bench
>gpt-4-0314, mistral large 2, and deepseekcoderv2 all do better on Aidan Bench than lmsysImage
Aug 3, 2024 10 tweets 2 min read
i’m gonna get so much hate for this, but llms are obviously conscious

got a lottttt of thoughts here; hopefully not a midwit thread first, i’m a physicalist.

dualism is obviously dumb (explain physical interaction) and panpsychism is a nothingburger (p zombies are not verifiably different and therefore untenable)
Jul 31, 2024 5 tweets 1 min read
prompting is a fantastic (maybe optimal?) way of steering llms, but no serious researcher would ever admit for fear that their 65 years of pytorch experience and 3 centuries of cuda pain might've been wasted to clever 21-year-olds Just Talking To A Model i'm unswayed by "but tuning/training gives you more control!!!" arguments.

as context lengths go to infinity, we should expect the set of steerable behaviors to converge with the set accessible through training
Jul 29, 2024 6 tweets 2 min read
>>Continuous Learning Model (CLM) by Topology<<

The CLM is a new model that remembers interactions, learns skills autonomously, and thinks in its free time, just like humans.

The CLM just wants to learn.

Try it at topologychat.com
Image LLMs are stateless.
>CLM remembers and references all chats

LLMs don’t have an inner-life.
>CLM forms ideas by mulling over memories in its free time

LLMs have no soul.
>CLM actively organizes memories/ideas, granting it an emergent personality Image
Jul 12, 2024 5 tweets 1 min read
something obviously true to me that nobody believes:

90% of frontier ai research is already on arxiv, x, or company blog posts.

q* is just STaR
search is just GoT/MCTS
continuous learning is clever graph retrieval
+1 oom efficiency gains in deepseek-coder paper Image nobody admits this because we're all suffering from "no-adults-in-the-room-ism"