💡Add next-level autocomplete into your website.
💡Easily fine-tune GPT on your private data.
💡Crowd-sourced distributed model training across thousands of devices.
Interested? I've open-sourced the code on Github: github.com/0hq/WebGPT
If you have Chrome Canary (or v113), try it here: kmeans.org
Keep in mind: I started this knowing nothing about transformers or attention, GPUs or matmuls, so the code is still rough/unoptimized. Contributions welcome!
Shoutout to @asciidiego for pitching me this project at a party two weeks ago (+ @karpathy for the lectures, as always).
• • •
Missing some Tweet in this thread? You can try to
force a refresh
open source will win, in the end. in the meantime, labs should focus on pushing the frontier as far as possible. packaging and distributing ai to the world for the sake of maximizing access is a secondary, while important, goal.
in the unlikely case where the gap between the public and private model grows over time, instead of shrinks, as it is now, responsible parties should publish research to close the gap.
progress can feel simultaneously fragile and inevitable. so many breakthroughs look so obvious and fated to be discovered in hindsight but looking forward is daunting and uncertain. i lean towards the latter: true innovation is rare, brittle, should be preserved at all costs.
Some reflection on what today's reasoning launch really means:
New Paradigm
I really hope people understand that this is a new paradigm: don't expect the same pace, schedule, or dynamics of pre-training era.
I believe the rate of improvement on evals with our reasoning models has been the fastest in OpenAI history.
It's going to be a wild year.
Generalization across Domain
o1 isn't just a strong math, coding, problem solving, etc. model but also the best model I've ever used for answering nuanced questions, teaching me new things, giving medical advice, or solving esoteric problems.
This shouldn't be taken for granted!
Safety by Reasoning
The fact that our reasoning models also improve on safety behavior and safety reasoning is very much non-trivial.
For years (a decade?) the boogeyman of the AI world was reinforcement learning agents which were incredibly adept at game playing but completely incapable of reasoning or understanding human values!
This is a strong point of evidence against this.
Scaling inference-time compute can compete with scaling training compute!
The fact that o1-mini is better than o1 on some evals is very very remarkable. The implications of this I'll leave as an exercise for the reader.
Multimodal Reasoning
It's kind of crazy that reasoning improves on multimodal evals as well! See MMMU and MathVista: these aren't small improvements.
To be clear I'm not one of the contributors to the o1 project: this has been the absolutely incredible work of the reasoning & related teams.
The rate of progress has just been faster than anything I've ever seen: it's absurd how fast the team has climbed the scaling OOMs just after discovering this paradigm.
Less seriously now:
I do want to also give a word of caution to the schizos, the hypemen, the fans and the haters:
This is a new paradigm. As with all nascent projects will be holes, bugs, issues to fix. Don't expect everything to be perfect instantly!
But you should take the rate of progress, the fact that we're solving problems that seemed miles away in the pretraining scaling laws, the fact that we now have visibility into solving many of the things which people have said LLMs could never do.
There's lots of quirks and benefits of the pretraining paradigm that might not exist in the reasoning paradigm, and vice versa. As a random example, I do believe there will be more examples of inverse scaling here than in the pre-training world (in which there were surprisingly few).
Onwards!
i think people are misunderstanding gpt-4o. it isn't a text model with a voice or image attachment. it's a natively multimodal token in, multimodal token out model.
you want it to talk fast? just prompt it to. need to translate into whale noises? just use few shot examples.
every trick in the book that you've been using for text also works for audio in, audio out, image perception, video perception, and image generation.
for example, you can do character consistent image generation just by conditioning on previous images. (see the blog post for more)
Starting from this image prompt:
This is Sally, a mail delivery person: Sally is standing facing the camera with a smile on her face.
Now Sally is being chased by a dog. Sally is running down the sidewalk and as a golden retriever is chasing her.
Uh oh, Sally has tripped!
Sally has tripped over a branch that was blocking the sidewalk, and she is trying to stand up. The dog is still chasing her in the background.
announcing... starlinkmap dot org
real-time map of every starlink satellite. tracks upcoming launches, other constellations, orbital updates, etc.
finally launching this after a while! more details below.
starlink is, imo, one of the most exciting technologies of our generation.
today, only 65% of the world has access to the internet at all (and far fewer have high-speed internet).
with direct-to-cell coming, soon every device, anywhere on Earth, will be connected together.
there's lots of stats on the website. here are some of the best:
- over 5,600 starlinks orbiting right now. right under 6000 ever launched.
- as of march: ~2.6 million starlink customers worldwide
- in the last year, there's been a starlink launch on average every 5.2 days!
DALLE-3 is the best product I've seen since GPT-4, super easy to just get sucked in for hours generating images. No need for prompting since GPT-4 does it for you.
Let me know if you have requests for prompts below. Here are some examples of what it can do:
It's shockingly good at styles that require consistent patterning like Pixel Art, mosaics, or dot matrices.