Some reflection on what today's reasoning launch really means:
New Paradigm
I really hope people understand that this is a new paradigm: don't expect the same pace, schedule, or dynamics of pre-training era.
I believe the rate of improvement on evals with our reasoning models has been the fastest in OpenAI history.
It's going to be a wild year.
Generalization across Domain
o1 isn't just a strong math, coding, problem solving, etc. model but also the best model I've ever used for answering nuanced questions, teaching me new things, giving medical advice, or solving esoteric problems.
This shouldn't be taken for granted!
Safety by Reasoning
The fact that our reasoning models also improve on safety behavior and safety reasoning is very much non-trivial.
For years (a decade?) the boogeyman of the AI world was reinforcement learning agents which were incredibly adept at game playing but completely incapable of reasoning or understanding human values!
This is a strong point of evidence against this.
Scaling inference-time compute can compete with scaling training compute!
The fact that o1-mini is better than o1 on some evals is very very remarkable. The implications of this I'll leave as an exercise for the reader.
Multimodal Reasoning
It's kind of crazy that reasoning improves on multimodal evals as well! See MMMU and MathVista: these aren't small improvements.
To be clear I'm not one of the contributors to the o1 project: this has been the absolutely incredible work of the reasoning & related teams.
The rate of progress has just been faster than anything I've ever seen: it's absurd how fast the team has climbed the scaling OOMs just after discovering this paradigm.
Less seriously now:
I do want to also give a word of caution to the schizos, the hypemen, the fans and the haters:
This is a new paradigm. As with all nascent projects will be holes, bugs, issues to fix. Don't expect everything to be perfect instantly!
But you should take the rate of progress, the fact that we're solving problems that seemed miles away in the pretraining scaling laws, the fact that we now have visibility into solving many of the things which people have said LLMs could never do.
There's lots of quirks and benefits of the pretraining paradigm that might not exist in the reasoning paradigm, and vice versa. As a random example, I do believe there will be more examples of inverse scaling here than in the pre-training world (in which there were surprisingly few).
Onwards!
This is something to remember this is not gpt-o1, it is o1, a new thing.
i think people are misunderstanding gpt-4o. it isn't a text model with a voice or image attachment. it's a natively multimodal token in, multimodal token out model.
you want it to talk fast? just prompt it to. need to translate into whale noises? just use few shot examples.
every trick in the book that you've been using for text also works for audio in, audio out, image perception, video perception, and image generation.
for example, you can do character consistent image generation just by conditioning on previous images. (see the blog post for more)
Starting from this image prompt:
This is Sally, a mail delivery person: Sally is standing facing the camera with a smile on her face.
Now Sally is being chased by a dog. Sally is running down the sidewalk and as a golden retriever is chasing her.
Uh oh, Sally has tripped!
Sally has tripped over a branch that was blocking the sidewalk, and she is trying to stand up. The dog is still chasing her in the background.
announcing... starlinkmap dot org
real-time map of every starlink satellite. tracks upcoming launches, other constellations, orbital updates, etc.
finally launching this after a while! more details below.
starlink is, imo, one of the most exciting technologies of our generation.
today, only 65% of the world has access to the internet at all (and far fewer have high-speed internet).
with direct-to-cell coming, soon every device, anywhere on Earth, will be connected together.
there's lots of stats on the website. here are some of the best:
- over 5,600 starlinks orbiting right now. right under 6000 ever launched.
- as of march: ~2.6 million starlink customers worldwide
- in the last year, there's been a starlink launch on average every 5.2 days!
DALLE-3 is the best product I've seen since GPT-4, super easy to just get sucked in for hours generating images. No need for prompting since GPT-4 does it for you.
Let me know if you have requests for prompts below. Here are some examples of what it can do:
It's shockingly good at styles that require consistent patterning like Pixel Art, mosaics, or dot matrices.
FIGMA-OS: The first Turing-complete Figma file.
SPECS: 8-bit architecture, 512 bits of RAM, 16 bytes of Program Memory, MISC instruction set of 16 OPCODES, 10HZ clock speed, 4 fast access registers, binary-tree RAM/ROM memory.
MOTIVATIONS: For the meme.
HOW: Explained below.
FIGMA-OS has every feature that any modern, enterprising technologist could possibly need:
► A stunning and detailed user manual.
► Useful pre-installed programs like: Fibonacci Numbers.
► An award-winning graphical user interface.
FIGMA-OS has been generously open-sourced to serve all your computing needs, live on the Figma Community today.
▼ Try our demo ▼
▼ Duplicate FIGMA-OS and see it for yourself ▼ figma.com/community/file…
do you have any hobbies?
yeah making computers out of things that shouldn't be computers. watch me be the first to bring turing completeness to figma
(edit going to build this tonight so scroll for my live tweeting of a computer) https://t.co/a07l9Ib0Qntwitter.com/i/web/status/1…
ok simple clock working seems promising. add/sub/mult/div already implemented for numbers already by figma, seems like there might be more ops for other types which is great
ok time to test limits and max out these variables. numbers represented as signed 32 bit ints, and will overflow to min int. interesting