Logan Kilpatrick Profile picture
May 27 6 tweets 2 min read Read on X
We just rolled out “thought summaries” in the Gemini API, now you can see what the model is thinking and make use of that info!

A thread with the details and request for feedback 🧵
To start, you can enable this with 2 lines of code (only 1 if you are already using thinking budgets).

Link to docs: ai.google.dev/gemini-api/doc…Image
Behind the scenes, the model is still reasoning with full thoughts, and then we have a summarization model which translates from full thoughts to summary, while preserving as much detail as possible.
The summarization technique will likely change over time and we are working to provide developers with more control (choose how you want to summarize, etc).

Summaries are experimental at the moment and free to enable.
Current model pricing doesn’t change, the thought tokens in “usage metadata” refer to the full thoughts (which is what you pay for) and the summaries are just free (and experimental).
Overall, excited to get this out into the hands of devs, we have a lot of exploration to do here still and are in feedback + iteration mode, so pls share thoughts on what we can do to make this more useful.

And yes, I know people want full thoughts : )

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Logan Kilpatrick

Logan Kilpatrick Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @OfficialLoganK

May 6
Gemini 2.5 Pro just got an upgrade & is now even better at coding, with significant gains in front-end web dev, editing, and transformation.

We also fixed a bunch of function calling issues that folks have been reporting, it should now be much more reliable. More details in 🧵Image
The new model, "gemini-2.5-pro-preview-05-06" is the direct successor / replacement of the previous version (03-25), if you are using the old model, no change is needed, it should auto route to the new version with the same price and rate limits.

developers.googleblog.com/en/gemini-2-5-…
And don't just take our word for it:

“The updated Gemini 2.5 Pro achieves leading performance on our junior-dev evals. It was the first-ever model that solved one of our evals involving a larger refactor of a request routing backend. It felt like a more senior developer because it was able to make correct judgement calls and choose good abstractions.”

– Silas Alberti, Founding Team, Cognition
Read 5 tweets
Aug 27, 2024
Today, we are rolling out three experimental models:

- A new smaller variant, Gemini 1.5 Flash-8B
- A stronger Gemini 1.5 Pro model (better on coding & complex prompts)
- A significantly improved Gemini 1.5 Flash model

Try them on , details in 🧵aistudio.google.com
For context, we are releasing experimental models to gather feedback and get our latest updates into the hands of developers. What we learn from experimental launches informs how we release models more widely. (2/N)
So let's talk 1.5 Flash-8B!

When the Gemini 1.5 technical report was released, we showcased some of the Google DeepMind team's early work creating an even smaller 8 billion parameter variant of the Gemini 1.5 Flash model. Today, we are making an improved version of that model accessible to developers for testing and feedback. This experimental model is intended for everything from high volume multimodal use cases to long context summarization tasks.

Gemini 1.5 Flash-8B experimental is available to test for free via Google AI Studio and the Gemini API today via “gemini-1.5-flash-8b-exp-0827”. We are excited to see what you think and to hear how this model might unlock even more new multimodal use cases.

Tech report: (3/N)arxiv.org/pdf/2403.05530
Read 6 tweets
Aug 19, 2024
We are giving developers 1,500,000,000 tokens for free everyday in the Gemini API

There is no stronger developer value proposition out there 🧵 (1/4)
Gemini 1.5 Flash free tier comes with:

- 15 RPM (requests per minute)
- 1 million TPM (tokens per minute)
- 1,500 RPD (requests per day)
- free context caching, up to 1 million tokens of storage per hour
- free fine-tuning

That’s 1.5 Billion tokens free, everyday.

(2/4)
Gemini 1.5 Pro free tier comes with:

- 2 RPM (requests per minute)
- 32,000 TPM (tokens per minute)
- 50 RPD (requests per day)

More modest, but shows what our higher intelligence models are capable of.

(3/4)
Read 6 tweets
Feb 24, 2024
Developers always ask me why they should bet on @OpenAI and our platform.

Well here’s the simple answer (skeptics welcome):

- compute (models)
- mission
- team
- focus

I’ll go into each below. 🧵
Compute: the scaling laws are holding up, meaning we can keep making things better with bigger models (while in many cases making them more inference effective). We are going to keep making the best models and give more flexibility to developers.

No one is making an order of magnitude bet on compute like we are.

Everyone else just woke up and realized they need xxx,000 GPU’s.

People like Sam and others at OpenAI have known this for years.

Further, most platforms are just hitting scale now (or still moving towards it), we have been living and breathing the platform scaling problem for a year and a half now.

The battle scars and lessons learned are important.
Mission: making AGI benefit all of humanity requires that we build for developers. Plain and simple, we wake up everyday and want to make a platform that devs can feel confident investing in for their mission critical infrastructure.

Every decision we make has developers and builders in mind.

You see this manifest itself across so many small things. Candidly, we aren’t perfect despite trying to be. But we are trying, and I like to think succeeding, at making something developers love.
Read 6 tweets
Nov 14, 2023
Great news for devs who were not at @OpenAI Dev Day, the breakout sessions are now live on YouTube! 🎉

Check out sessions on:

- Maximizing LLM performance
- The New Stack and Ops for AI
- The Business of AI

And more! Details below 👇
A great deep dive on the new stack for LLM Ops by @sherwinwu and @shyamalanadkat:

A Survey of Techniques for Maximizing LLM Performance (including RAG and fine-tuning) with @colintjarvis and John Allard (I wish John was on Twttr):

Read 6 tweets
Nov 6, 2023
Today is the biggest day ever for developers building with @OpenAI

We are releasing new models, API’s, open source models, and more. Full details in 🧵
Announcing GPT-4 Turbo, our latest and most powerful foundation model. 🔥

It comes with:

- lower prices (2 - 3x)
- 128,000 token context
- April 2023 knowledge cutoff
- 2x higher rate limits

And is available to all developers in the next couple of hours
More model updates: 📈

Both GPT-4 Turbo and the updated version of 3.5 Turbo come with:

- JSON mode
- better instruction following
- reproducible outputs and log probs

3.5 is now standard with 16k context in the API.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(