Post

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @OpenAI

OpenAI

@OpenAI

Oct 21

Meet our new browser—ChatGPT Atlas.

Available today on macOS: chatgpt.com/atlas

ChatGPT can see the page you’re on and answer your questions right there via the Ask ChatGPT sidebar.

ChatGPT can offer suggestions wherever you’re typing on the web.

Read 6 tweets

OpenAI

@OpenAI

Oct 6

Everything shipped at DevDay [2025] 🧵

https://x.com/OpenAI/status/1975261587280961675

https://x.com/OpenAI/status/1975261587280961675

https://x.com/OpenAIDevs/status/1975261988751351868

https://x.com/OpenAIDevs/status/1975261988751351868

Read 8 tweets

OpenAI

@OpenAI

Sep 25

Today we’re introducing GDPval, a new evaluation that measures AI on real-world, economically valuable tasks.

Evals ground progress in evidence instead of speculation and help track how AI improves at the kind of work that matters most.
openai.com/index/gdpval-v0

GDPval spans 44 occupations selected from the top 9 sectors contributing to U.S. Gross Domestic Product (GDP).

Tasks are constructed from the representative work of industry professionals with an average of 14 years of experience.

Read 6 tweets

OpenAI

@OpenAI

Sep 17

Today we’re releasing research with @apolloaievals.

In controlled tests, we found behaviors consistent with scheming in frontier models—and tested a way to reduce it.

While we believe these behaviors aren’t causing serious harm today, this is a future risk we’re preparing for. openai.com/index/detectin…

Scheming = when an AI behaves one way on the surface while hiding its true goals.

Today’s deployed systems have little opportunity to scheme in ways that could cause serious harm. The most common failures are simple deceptions—like pretending to complete a task without doing it. We’ve studied and mitigated these issues and made meaningful improvements in GPT-5 over earlier models.

But as AIs take on more complex, long-term tasks with real-world impact, the potential for harmful scheming will grow—so our safeguards and testing must grow with it.

Typically, as models become smarter, their problems become easier to address—for example, smarter models hallucinate less and follow instructions more reliably.

However, AI scheming is different.

As we train models to get smarter and follow directions, they may either better internalize human goals or just get better at hiding their existing true goals.

The core of anti-scheming research is to distinguish between these two, which requires understanding the reasoning behind a model's behavior.

Read 7 tweets

OpenAI

@OpenAI

Aug 5

We released two open-weight reasoning models—gpt-oss-120b and gpt-oss-20b—under an Apache 2.0 license.

Developed with open-source community feedback, these models deliver meaningful advancements in both reasoning capabilities & safety.

openai.com/index/introduc…

gpt-oss-120b matches OpenAI o4-mini on core benchmarks and exceeds it in narrow domains like competitive math or health-related questions, all while fitting on a single 80GB GPU (or high-end laptop).

gpt-oss-20b fits on devices as small as 16GB, while matching or exceeding OpenAI o3-mini.

These models are trained for agentic workflows—supporting function calling, web search, Python execution, configurable reasoning effort, and full raw chain-of-thought access. github.com/openai/gpt-oss

Read 7 tweets

OpenAI

@OpenAI

Jul 17

ChatGPT can now do work for you using its own computer.

Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.

ChatGPT agent starts rolling out today to Pro, Plus, and Team users.

Pro users will get access by the end of day, while Plus and Team users will get access over the next few days.

Enterprise and Edu users will get access in the coming weeks. openai.com/index/introduc…