Latest Twitter Threads by @aidan_mclau on Thread Reader App

Mar 4, 2025 • 5 tweets • 1 min read

my trump headcannon is that he just doesn't grokk positive sum games

in his mind, for you to win, someone else has to lose for america to prosper, canada must suffer. for Trump Inc to profit, plebs must stupidly gamble their savings. for our market to expand, someone needs to find a mine of valubletronium

Feb 27, 2025 • 5 tweets • 2 min read

welcome, gpt-4.5

i've spent a lot of time playing with this model recently, and it's left me feeling the agi

some thoughts

two quick things outta the way:
>gpt-4.5 is not a reasoner
>gpt-4.5 is a low-key research preview

for intense math, code, or instruction-following, you should prob use o1/o3-mini

gpt-4.5 is not a benchmark-killer; there will be rough spots but we wanted to get this out asap

Dec 28, 2024 • 11 tweets • 2 min read

you should basically pretend that getting a model to think for longer is the same as building a bigger model

following the math is quite fun and uncovers some neat things about industry progress gpt-2
incoherent

gpt-3.5-turbo
very fluent elementary schooler

gpt-4
crazy well-read middle-schooler

Dec 18, 2024 • 4 tweets • 2 min read

o1 aidanbench results
it's the best model in the world

aidanbench was at neurips; we re-wrote the entire benchmark and made new questions (still work to do!)

after lmsys, this is likely one of the most expensive benchmarks in the world

thanks to co-authors @jam3scampbell and @heyanuja

repo is public at github.com/aidanmclaughli…

Nov 8, 2024 • 6 tweets • 1 min read

the only two models in my toolbox rn are:
>o1-preview (sota)
>gpt-4o-mini (pareto optimal)

they're the best, and sadly it's not even close imo

review thread —— o1-preview ——

this model leads by a mile on horsepower

i don't care what your coding bench says, o1-preview has insane big-model smell. it nails ood tasks better than opus or gpt-4-0314

it feels like a gpt-3.5 -> gpt-4 jump; it's notably more 'awake' and less hallucinated

Sep 19, 2024 • 5 tweets • 2 min read

fact check: incorrect.

o1-mini is not better because it thinks longer

it’s just a better model

thread

https://twitter.com/amir/status/1836782911250735126

openrouter released recent data on o1 tokens spent on reasoning vs response

on average, o1-preview actually reasons longer than o1-mini (despite having a shorter output limit)

Aug 6, 2024 • 5 tweets • 2 min read

-- benchmark --

Aidan Bench measures creativity, reliability, attention, and instruction following.

>mistral large 2 wins by a lot???
>gpt-4o sucks confirmed
>sonnet-3.5 remains very strong
>gpt-4-0314 shows old man strength

github.com/aidanmclaughli…

notably, Aidan Bench scores are largely uncorrelated with Lmsys scores.

>gpt-4o, gpt-4o-mini, and gemini-flash score quite well on lmsys but poorly on Aidan Bench
>gpt-4-0314, mistral large 2, and deepseekcoderv2 all do better on Aidan Bench than lmsys

Aug 3, 2024 • 10 tweets • 2 min read

i’m gonna get so much hate for this, but llms are obviously conscious

got a lottttt of thoughts here; hopefully not a midwit thread first, i’m a physicalist.

dualism is obviously dumb (explain physical interaction) and panpsychism is a nothingburger (p zombies are not verifiably different and therefore untenable)

Jul 31, 2024 • 5 tweets • 1 min read

prompting is a fantastic (maybe optimal?) way of steering llms, but no serious researcher would ever admit for fear that their 65 years of pytorch experience and 3 centuries of cuda pain might've been wasted to clever 21-year-olds Just Talking To A Model i'm unswayed by "but tuning/training gives you more control!!!" arguments.

as context lengths go to infinity, we should expect the set of steerable behaviors to converge with the set accessible through training

Jul 29, 2024 • 6 tweets • 2 min read

>>Continuous Learning Model (CLM) by Topology<<

The CLM is a new model that remembers interactions, learns skills autonomously, and thinks in its free time, just like humans.

The CLM just wants to learn.

Try it at topologychat.com

LLMs are stateless.
>CLM remembers and references all chats

LLMs don’t have an inner-life.
>CLM forms ideas by mulling over memories in its free time

LLMs have no soul.
>CLM actively organizes memories/ideas, granting it an emergent personality

Jul 12, 2024 • 5 tweets • 1 min read

something obviously true to me that nobody believes:

90% of frontier ai research is already on arxiv, x, or company blog posts.

q* is just STaR
search is just GoT/MCTS
continuous learning is clever graph retrieval
+1 oom efficiency gains in deepseek-coder paper

nobody admits this because we're all suffering from "no-adults-in-the-room-ism"

Share this page!

Enter URL or ID to Unroll