Oriol Vinyals Profile picture
Nov 18 1 tweets 1 min read Read on X
The secret behind Gemini 3?

Simple: Improving pre-training & post-training 🤯

Pre-training: Contra the popular belief that scaling is over—which we discussed in our NeurIPS '25 talk with @ilyasut and @quocleix—the team delivered a drastic jump. The delta between 2.5 and 3.0 is as big as we've ever seen. No walls in sight!

Post-training: Still a total greenfield. There's lots of room for algorithmic progress and improvement, and 3.0 hasn't been an exception, thanks to our stellar team.

Congratulations to the whole team 💙💙💙Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Oriol Vinyals

Oriol Vinyals Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @OriolVinyalsML

Feb 12, 2023
Chain ⛓️ Rule(s) rules! Appreciation thread of one of the most interesting coincidences in machine learning. Two rules, both named "Chain Rule", happen to be absolutely critical to recent advances in ML & AI. A 🧵 on the Chain Rule of Probability & the Chain Rule of Calculus👇
The Chain Rule of Probability is a powerful tool behind recent advances in Large Language Models. By multiplying together the probabilities of many smaller events, we can compute the probability of a complex event made up of those smaller events.
p(abc) = p(c|ab) * p(b|a) * p(a)
By smaller events here, we refer to the probability of a token, given past tokens, p(c|ab). In probabilistic language modeling, a “token” is a single unit of text, like a word or part of a word. Modern language models consider a vocabulary size of ~100K tokens.
Read 7 tweets
Oct 2, 2022
This neural network architecture that was showcased at the @Tesla AI day is a perfect example of Deep Learning at its finest. Mix and match all the greatest innovations to do something drastic and super ambitious. Congrats!
Treating the job of figuring out valid "lanes" from images as language is brilliant. Combining CNNs, transformers, attention, pointer networks, etc., you essentially write a set of instructions to build up the graph by connecting the dots, start new lanes, set curvature, etc.
This isn't ML-new, but who cares? Applied at the level of ambition of full-scale real world impact, with the right team, execution, (and compute/data!), you can do things that felt impossible before. Both the architecture and cool use of language heavily reminded me of AlphaStar.
Read 4 tweets
Jan 5, 2022
2021 personal highlights, a🧵. Despite being a challenging year globally due to the pandemic 😷🦠, but thanks to many incredible collaborators, it's been an exciting year research-wise 🤖 Some highlights below.👇
Diversity and inclusion. I kept engaged through our efforts @DeepMind, mentorship and as a member of @Khipu_AI community, supporting AI in Latin America, where we had a fireside chat w/ @geoffreyhinton (). I was also a mentor for: docs.google.com/spreadsheets/d…
Perceiver. Being able to treat every modality as a sequence of bytes has been a personal deep learning dream. Perceiver is a transformer-derived architecture proposing a few modifications to achieve this.

Papers: arxiv.org/abs/2103.03206 & arxiv.org/abs/2107.14795
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(