OpenAI Developers Profile picture
Apr 8 3 tweets 1 min read Read on X
Introducing the Evals API.

You can now programmatically define tests, automate evaluation runs, and quickly iterate on prompts.

Evals are still available in the dashboard—and now through the API, so you can integrate them anywhere in your workflow. Image
Eval configuration, test data, and parameters for test runs are all just as configurable via API as they are in the evals UI in the dashboard.

Learn how the API works in the documentation: platform.openai.com/docs/guides/ev…
Having good evals in place for your applications helps you improve the quality of your model responses systematically.

Learn to test your prompts for regressions in the OpenAI Cookbook: cookbook.openai.com/examples/evalu…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with OpenAI Developers

OpenAI Developers Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @OpenAIDevs

Jun 9
Evals now supports tool use. 🛠️

You can now use tools and Structured Outputs when completing eval runs, and evaluate tool calls based on the arguments passed and responses returned. This supports tools that are OpenAI-hosted, MCP, and non-hosted. Read more in our guides below.
Read 5 tweets
Jun 3
We’re rolling out a few updates to Codex today:

1. Codex is rolling out to ChatGPT Plus users today. It includes generous usage limits for a limited time, but during periods of high demand, we might set rate limits for Plus users so that Codex remains widely available.
2. Next, our top requested feature: You can now give Codex access to the internet during task execution to install base dependencies, run tests that need external resources, upgrade or install packages needed to build new features, and more.
3. Internet access is off by default, and can be enabled when creating a new environment or by editing an existing one. You have full control over the domains and HTTP methods Codex can use during task execution. Learn more about usage and risks in the docs: platform.openai.com/docs/codex/age…Image
Read 5 tweets
Jun 3
🆕 Four updates to building agents with OpenAI: Agents SDK in TypeScript, a new RealtimeAgent feature for voice agents, Traces support for the Realtime API, and improvements to our speech-to-speech model.
The Agents SDK is now available in TypeScript and supports handoffs, guardrails, tracing, MCP, and other core agent primitives, just like the Python version.

openai.github.io/openai-agents-…Image
It includes new support for human-in-the-loop approvals, allowing you to pause tool execution, serialize and store the agent state, approve or reject specific calls, and resume the agent run.

openai.github.io/openai-agents-…
Read 11 tweets
Apr 23
Image gen is now available in the API!

We’re launching gpt-image-1, making ChatGPT’s powerful image generation capabilities available to developers worldwide starting today.

✅ More accurate, high fidelity images
🎨 Diverse visual styles
✏️ Precise image editing
🌎 Rich world knowledge
🔤 Consistent text rendering

Check out the guide to start building: platform.openai.com/docs/guides/im…Image
You can also use gpt-image-1 in the Playground to quickly iterate on prompts and images: platform.openai.com/playground/ima…
We’ve been collaborating closely with developers to understand where image gen can be most useful in the real world. Here are some examples from early adopters across domains like creative tools, consumer apps, enterprise software, and more below. 👇
Read 9 tweets
Mar 11
We're launching new tools to help developers build reliable and powerful AI agents. 🤖🔧

Timestamps:
01:54 Web search
02:41 File search
03:22 Computer use
04:07 Responses API
10:17 Agents SDK
Our new API primitive: the Responses API. Combining the simplicity of Chat Completions with the tool-use of Assistants, this new foundation provides more flexibility in building agents. Web search, file search, or computer use are a couple lines of code!

platform.openai.com/docs/quickstar…
🔎 Web search—integrate with a few lines of code and your agent can get up-to-date answers from the web (with citations).

Available in preview: platform.openai.com/docs/guides/to…Image
Read 7 tweets
Feb 13
We’ve put together some best practices on using our o-series models:

🤔 Differences between our reasoning and non-reasoning models
💡 When to use reasoning models
💬 How to approach prompting reasoning models effectively
GPT models focus on fast, cost‑efficient, straightforward task execution—“the workhorses.” 🐎

o‑series models excel at handling ambiguous, multi‑step tasks in domains such as math, engineering, legal, and finance—“the planners.” 🧠
Use o-series models to process unstructured data, find a needle in a haystack, improve code, or handle other complex tasks. For example, o1’s vision capabilities can analyze detailed architectural drawings. In this image, o1 recognized that “PT” wood posts were pressure-treated. Image
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(