Behind the scenes, the model is still reasoning with full thoughts, and then we have a summarization model which translates from full thoughts to summary, while preserving as much detail as possible.
The summarization technique will likely change over time and we are working to provide developers with more control (choose how you want to summarize, etc).
Summaries are experimental at the moment and free to enable.
Current model pricing doesn’t change, the thought tokens in “usage metadata” refer to the full thoughts (which is what you pay for) and the summaries are just free (and experimental).
Overall, excited to get this out into the hands of devs, we have a lot of exploration to do here still and are in feedback + iteration mode, so pls share thoughts on what we can do to make this more useful.
And yes, I know people want full thoughts : )
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Gemini 2.5 Pro just got an upgrade & is now even better at coding, with significant gains in front-end web dev, editing, and transformation.
We also fixed a bunch of function calling issues that folks have been reporting, it should now be much more reliable. More details in 🧵
The new model, "gemini-2.5-pro-preview-05-06" is the direct successor / replacement of the previous version (03-25), if you are using the old model, no change is needed, it should auto route to the new version with the same price and rate limits.
“The updated Gemini 2.5 Pro achieves leading performance on our junior-dev evals. It was the first-ever model that solved one of our evals involving a larger refactor of a request routing backend. It felt like a more senior developer because it was able to make correct judgement calls and choose good abstractions.”
Today, we are rolling out three experimental models:
- A new smaller variant, Gemini 1.5 Flash-8B
- A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
- A significantly improved Gemini 1.5 Flash model
For context, we are releasing experimental models to gather feedback and get our latest updates into the hands of developers. What we learn from experimental launches informs how we release models more widely. (2/N)
So let's talk 1.5 Flash-8B!
When the Gemini 1.5 technical report was released, we showcased some of the Google DeepMind team's early work creating an even smaller 8 billion parameter variant of the Gemini 1.5 Flash model. Today, we are making an improved version of that model accessible to developers for testing and feedback. This experimental model is intended for everything from high volume multimodal use cases to long context summarization tasks.
Gemini 1.5 Flash-8B experimental is available to test for free via Google AI Studio and the Gemini API today via “gemini-1.5-flash-8b-exp-0827”. We are excited to see what you think and to hear how this model might unlock even more new multimodal use cases.
We are giving developers 1,500,000,000 tokens for free everyday in the Gemini API
There is no stronger developer value proposition out there 🧵 (1/4)
Gemini 1.5 Flash free tier comes with:
- 15 RPM (requests per minute)
- 1 million TPM (tokens per minute)
- 1,500 RPD (requests per day)
- free context caching, up to 1 million tokens of storage per hour
- free fine-tuning
That’s 1.5 Billion tokens free, everyday.
(2/4)
Gemini 1.5 Pro free tier comes with:
- 2 RPM (requests per minute)
- 32,000 TPM (tokens per minute)
- 50 RPD (requests per day)
More modest, but shows what our higher intelligence models are capable of.
Developers always ask me why they should bet on @OpenAI and our platform.
Well here’s the simple answer (skeptics welcome):
- compute (models)
- mission
- team
- focus
I’ll go into each below. 🧵
Compute: the scaling laws are holding up, meaning we can keep making things better with bigger models (while in many cases making them more inference effective). We are going to keep making the best models and give more flexibility to developers.
No one is making an order of magnitude bet on compute like we are.
Everyone else just woke up and realized they need xxx,000 GPU’s.
People like Sam and others at OpenAI have known this for years.
Further, most platforms are just hitting scale now (or still moving towards it), we have been living and breathing the platform scaling problem for a year and a half now.
The battle scars and lessons learned are important.
Mission: making AGI benefit all of humanity requires that we build for developers. Plain and simple, we wake up everyday and want to make a platform that devs can feel confident investing in for their mission critical infrastructure.
Every decision we make has developers and builders in mind.
You see this manifest itself across so many small things. Candidly, we aren’t perfect despite trying to be. But we are trying, and I like to think succeeding, at making something developers love.