The expected value of your impact on the world is like a vector.
It is defined by two things: direction and magnitude. That’s it.
Direction is what you choose to work on. Almost no one spends enough time thinking about this. A useful framework for this is to think on a long-but-not-too-long timescale (10-20 years seems to work),
to think about where the world is going to go if current exponentials continue on (which is harder to do than it sounds like it should be), to think about what you’re genuinely interested in, and to think about what you can do better than anyone else
(someone will ~always be better than you at any one thing—the easiest way to do something no one else can is to be 95th percentile at several skills, and to do something at their intersection). You also have to learn to trust yourself when people don’t see what you see.
Magnitude is how hard you push in your chosen direction. Most people don’t push nearly hard enough—they give up too quickly, or care too much about what other people think, or don’t work hard enough, or something like that.
Pushing hard is often uncomfortable, but it is how things get moved. Developing an early and strong sense of self-belief (but not so strong you don’t adapt to feedback and new data) is critical to this. Getting people to join you in your quest, and inspiring them to outperform,
is usually critical—most really important things can only be done by teams. The easiest way to push hard over a long period of time seems to be to really care a lot about the work itself and the outcome you’re striving towards.
I find it liberating that you only have get two big things right!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
first, GPT-5 in an integrated model, meaning no more model switcher and it decides when it needs to think harder or not.
it is very smart, intuitive, and fast.
it is available to everyone, including the free tier, w/reasoning!
evals aren't the most important thing--the most important thing is how useful we think the model will be--but it does well on evals. for example, a new high on SWE-bench and many other metrics.
it is by far our most reliable and factual model ever.
rolling out today for free, plus, pro, and team users. next week to enterprise and edu.
making this available in the free tier is a big deal to us; PhD-level intelligence for everyone!
it can go use the internet, do complex research and reasoning, and give you back a report.
it is really good, and can do tasks that would take hours/days and cost hundreds of dollars.
people will post lots of great examples, but here is a fun one:
i am in japan right now and looking for an old NSX. i spent hours searching unsuccessfully for the perfect one. i was about to give up and deep research just...found it.
it is very compute-intensive and slow, but it's the first ai system that can do such a wide variety of complex, valuable tasks.
going live in our pro tier now, with 100 queries per month.
plus, team, and enterprise will come soon, and then free tier.
here is o1, a series of our most capable and aligned models yet:
o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. openai.com/index/learning…
but also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning.
o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.
screenshot of eval results in the tweet above and more in the blog post, but worth especially noting:
a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem.
it is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.
it is more creative than previous models, it hallucinates significantly less, and it is less biased. it can pass a bar exam and score a 5 on several AP exams. there is a version with a 32k token context.
we are previewing visual input for GPT-4; we will need some time to mitigate the safety challenges.