Jeff Dean (@🏡) Profile picture
Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)
Sean Marrett Profile picture Murali Profile picture lt Profile picture Yomi Shishio Profile picture Dinesh Puppala Profile picture 5 subscribed
Feb 21 • 7 tweets • 4 min read
Introducing Gemma - a family of lightweight, state-of-the-art open models for their class, built from the same research & technology used to create the Gemini models.

Blog post:

Tech report:


This thread explores some of the performance characteristics of these models.blog.google/technology/dev…
goo.gle/GemmaReportImage The Gemma-7B model exceeds the performance of the widely used Llama-2 7B and 13B models on 8 of 8 benchmarks covering general language understanding, reasoning, math, and coding. Image
Feb 15 • 17 tweets • 15 min read
Gemini 1.5 Pro - A highly capable multimodal model with a 10M token context length

Today we are releasing the first demonstrations of the capabilities of the Gemini 1.5 series, with the Gemini 1.5 Pro model. One of the key differentiators of this model is its incredibly long context capabilities, supporting millions of tokens of multimodal input. The multimodal capabilities of the model means you can interact in sophisticated ways with entire books, very long document collections, codebases of hundreds of thousands of lines across hundreds of files, full movies, entire podcast series, and more.

Gemini 1.5 was built by an amazing team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google. @OriolVinyals (my co-technical lead for the project) and I are incredibly proud of the whole team, and we’re so excited to be sharing this work and what long context and in-context learning can mean for you today!

There’s lots of material about this, some of which are linked to below.

Main blog post:


Technical report:
“Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context”


Videos of interactions with the model that highlight its long context abilities:
Understanding the three.js codebase:
Analyzing a 45 minute Buster Keaton movie:
Apollo 11 transcript interaction:

Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI. Read more about this on these blogs:
Google for Developers blog:

Google Cloud blog:


We’ll also introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model.

Early testers can try the 1 million token context window at no cost during the testing period. We’re excited to see what developer’s creativity unlocks with a very long context window.

Let me walk you through the capabilities of the model and what I’m excited about!blog.google/technology/ai/…
goo.gle/GeminiV1-5



developers.googleblog.com/2024/02/gemini…
cloud.google.com/blog/products/…Image Needle in a Haystack Tests Out to 10M Tokens

First, let’s take a quick glance at a needle-in-a-haystack test across many different modalities to exercise Gemini 1.5 Pro’s ability to retrieve information from its very long context. In these tests, green is good, and red is not good, and these are almost entirely green (>99.7% recall), even out to 10M tokens. Great! A bit more on needle-in-a-haystack tests later in the thread.Image
Dec 6, 2023 • 20 tweets • 11 min read
I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks, including 10 of 12 popular text and reasoning benchmarks, 9 of 9 image understanding benchmarks, 6 of 6 video understanding benchmarks, and 5 of 5 speech recognition and speech translation benchmarks. Gemini Ultra is the first model to achieve human-expert performance on MMLU across 57 subjects with a score above 90%. It also achieves a new state-of-the-art score of 62.4% on the new MMMU multimodal reasoning benchmark, outperforming the previous best model by more than 5 percentage points.

Gemini was built by an awesome team of people from @GoogleDeepMind, @GoogleResearch, and elsewhere at @Google, and is one of the largest science and engineering efforts we’ve ever undertaken. As one of the two overall technical leads of the Gemini effort, along with my colleague @OriolVinyalsML, I am incredibly proud of the whole team, and we’re so excited to be sharing our work with you today!

There’s quite a lot of different material about Gemini available, starting with:

Main blog post:

60-page technical report authored by th Gemini Team:

In this thread, I’ll walk you through some of the highlights.

Image
Image
The multimodal and reasoning capabilities of Gemini are quite strong. The benchmark results, which I’ll discuss in a moment are nice, but I’m most excited by demonstrations of what it can do.

Consider the image below. A teacher has drawn a physics problem of a skier going down a slope, and a student has worked through a solution to computing the speed of the skier at the bottom of the slope. Using Gemini’s multimodal reasoning capabilities, the model is able to read the messy handwriting, correctly understand the problem formulation, convert both the problem and solution to mathematical typesetting, identify the specific step of reasoning where the student went wrong in solving the problem, and then give a worked through correct solution to the problem. The possibilities in education alone are exciting, and these multimodal and reasoning capabilities of Gemini models could have dramatic applications across many fields.
Image
Jan 18, 2023 • 5 tweets • 3 min read
Excited to share the first of a series of @GoogleAI blog posts summarizing our research work from 2022. This covers language & multimodal models, computer vision, and generative models. We'll have ~7 posts covering other areas over next few weeks!

ai.googleblog.com/2023/01/google… Image A huge thanks to all of the @GoogleResearch community whose great work is represented in this post as well as the subsequent posts in the series. 🙏
May 11, 2022 • 6 tweets • 2 min read
Today at #GoogleIO @sundarpichai showed some examples of the capabilities of the PaLM 540B language model. For example, you can prompt the model with:

"I will ask a question in Bengali and get English and Bengali answers"

And then give it two examples of this behavior.

(cont) You can then ask novel questions in Bengali, and get surprisingly good answers on both English and Bengali:
Jan 11, 2022 • 16 tweets • 6 min read
As in past years, I've spent part of the holiday break summarizing much of the work we've done in @GoogleResearch over the last year. On behalf of @Google's research community, I'm delighted to share this writeup (this year grouped into five themes).

ai.googleblog.com/2022/01/google… The material covered in the post represents the work of a tremendous number of people from all across @GoogleResearch, @Google, and beyond (through our many collaborations).

And a huge thank you to the many people listed at the end of the post who helped with this!
Jul 5, 2021 • 8 tweets • 5 min read
It's often helpful to have people to turn to for advice, suggestions, etc., no matter where you are in your career.  Many great programs exist to help with this in CS, AI/ML, and related fields.  I encourage people to look at any or all of these that might be appropriate: ...
@DevColorOrg: A* program: devcolor.org/a-program/

@_LXAI: latinxinai.org/mentorship-pro…
Jan 12, 2021 • 4 tweets • 1 min read
On behalf of the entire Google Research & @GoogleAI communities, I'm excited to share an overview of some of our research in 2020.

Thanks to everyone who helped make this work possible!

ai.googleblog.com/2021/01/google… The post touches on many areas, including COVID-19, Health, Weather & Climate Change, Accessibility, Responsible AI, Natural Language Understanding, ML Algorithms, Applications of ML to Science, Machine Perception, Robotics, Algorithmic Theory, Open Datasets, and more.
Jul 7, 2020 • 24 tweets • 13 min read
AI is full of promise, with the potential to revolutionize so many different areas of modern society.

In order to realize its true potential, our field needs to be welcoming to all people. As it stands today, it is definitely not.

Our field has a problem with inclusiveness. Too many in the field see those who are different as people to be belittled, demeaned, harassed, gaslit, or otherwise made to feel unwelcome or question whether they “belong”.
Mar 8, 2020 • 6 tweets • 2 min read
"It was full of... misinformation about the virus & the US response. That’s particularly painful coming from inside the CDC, a longtime powerhouse in global public health now reduced to being a backdrop for grubby politics."

Having worked @CDCgov & @WHO, it pains me to see this. When I worked at @WHO, I was part of the Global Programme on AIDS (now @UNAIDS), created to help the world tackle the HIV/AIDS pandemic. The staff there were dedicated doctors and scientists intensely focused on helping address that crisis.
Sep 15, 2018 • 33 tweets • 16 min read
A reminder that some people in our field are alienating our female colleagues by flirting in settings that are meant to be professional and by trying to turn what should be topic- & setting-appropriate conversations into dates.

Please don't do this. I'd like to draw attention to some of the retweets and replies that indicate how widespread an issue this is. Here's one, where @jeggers relates her experience (expand the thread).