Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time:
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. openai.com/index/hello-gp…
Two GPT-4os interacting and singing
Realtime translation with GPT-4o
Lullabies and whispers with GPT-4o
Happy birthday with GPT-4o
@BeMyEyes with GPT-4o
Dad jokes with GPT-4o
Meeting AI with GPT-4o
Sarcasm with GPT-4o
Math problems with GPT-4o and @khanacademy
Point and learn Spanish with GPT-4o
Rock, Paper, Scissors with GPT-4o
Harmonizing with two GPT-4os
Interview prep with GPT-4o
Fast counting with GPT-4o
Dog meets GPT-4o
Live demo of GPT-4o realtime conversational speech
Live demo of GPT-4o voice variation
Live demo of GPT-4o vision
Live demo of coding assistance and desktop app
Live audience request for GPT-4o realtime translation
Live audience request for GPT-4o vision capabilities
All users will start to get access to GPT-4o today. In coming weeks we’ll begin rolling out the new voice and vision capabilities we demo’d today to ChatGPT Plus.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Introducing ChatGPT Health — a dedicated space for health conversations in ChatGPT. You can securely connect medical records and wellness apps so responses are grounded in your own health information.
Designed to help you navigate medical care, not replace it.
ChatGPT Health can help you navigate everyday questions and spot patterns over time, so you feel more informed, prepared, and confident for important medical conversations.
If you choose, ChatGPT Health lets you securely connect medical records and apps like Apple Health, MyFitnessPal, and Peloton to give personalized responses.
To preserve chain-of-thought (CoT) monitorability, we must be able to measure it.
We built a framework + evaluation suite to measure CoT monitorability — 13 evaluations across 24 environments — so that we can actually tell when models verbalize targeted aspects of their internal reasoning. openai.com/index/evaluati…
Monitoring a model’s chain-of-thought is far more effective than watching only its actions or final answers.
The more a model “thinks” (longer CoTs), the easier it is to spot issues.
RL at today’s frontier doesn’t seem to wreck monitorability and can help early reasoning steps. But there’s a tradeoff: smaller models run with higher reasoning effort can be easier to monitor at similar capability — at the cost of extra inference compute (a “monitorability tax”).
Accelerating scientific progress is one of the most impactful ways AI can benefit society. Models can already help researchers reason through hard problems — but doing this well means testing models on tougher evaluations and in real scientific workflows grounded in experiments.
We’re releasing a new eval to measure expert-level scientific reasoning: FrontierScience.
This benchmark measures PhD-level scientific reasoning across physics, chemistry, and biology.
It contains hard, expert-written questions (both olympiad-style problems and longer research-style tasks) designed to reveal where models succeed and where they fall short. openai.com/index/frontier…
GPT-5.2 is our strongest model on the FrontierScience eval, showing clear gains on hard scientific tasks.
But the benchmark also reveals a gap between strong performance on structured problems and the open-ended, iterative reasoning that real research requires.
GPT-5.2 Instant, Thinking, and Pro are rolling out today, starting with Plus, Pro, Business, and Enterprise plans. Free and Go users will get access tomorrow.
Introducing shopping research, a new experience in ChatGPT that does the research to help you find the right products.
It’s everything you like about deep research but with an interactive interface to help you make smarter purchasing decisions.
Shopping research asks smart clarifying questions, researches deeply across the internet, reviews quality sources, and builds on ChatGPT’s understanding of you from past conversations and memory to deliver a personalized buyer’s guide in minutes.
Most neural networks today are dense and highly entangled, making it difficult to understand what each part is doing.
In our new research, we train “sparse” models—with fewer, simpler connections between neurons—to see whether their computations become easier to understand.
Unlike with normal models, we often find that we can pull out simple, understandable parts of our sparse models that perform specific tasks, such as ending strings correctly in code or tracking variable types.
We also show promising early signs that our method could potentially scale to understand more complex behaviors.