Yao Fu Profile picture
Research Scientist at @GoogleDeepMind I study complex, multimodal, interactive reasoning. Opinions are my own
Apr 25, 2024 7 tweets 3 min read
From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factualityImage Retrieval heads are universal and sparse: all transformer language models we study, either base or chat, short or long, small or large, dense or MoE -- as long as they pass needle in a haystack, they have a small set of retrieval headsImage
Jun 8, 2023 4 tweets 2 min read
Is Falcon really better than LLaMA?
Short take: probably not.

Longer take: we reproduced LLaMA 65B eval on MMLU and we got 61.4, close to the official number (63.4), much higher than its Open LLM Leaderboard number (48.8), and clearly higher than Falcon (52.7).

Code and prompt… twitter.com/i/web/status/1… Update:
@BlancheMinerva pointed out that a fair comparison is to also run Falcon on MMLU with the default settings. She is right, and we are running this right now, likely to get the results in a day or so.
So stay tuned, and see how the numbers compares.

Again, my personal… twitter.com/i/web/status/1…
May 18, 2023 7 tweets 4 min read
Introducing GPT-Bargaining: Claude + Cohere + AI21 v.s. GPT

Who could win a better deal 💵 ?

We ask 2 LLMs bargain about a 🎈, then a third LLM provide AI feedback, then improve from AI feedback, and see if LLMs could continuously autonomously evolve

arxiv.org/abs/2305.10142 Image We are very much surprised 😮 by the superhuman — or at least better than me 😅 bargaining techniques suggested by the AI critic, and how the model improve from these feedback 🎉 Bargaining techniques sugge...
May 3, 2023 7 tweets 3 min read
The core differences between GPT4 and 3.5 is the ability to perform complex tasks. In this post, we present a complete roadmap towards LLMs complex reasoning abilities, covering the full development stages: pretraining, SFT, RL, CoT prompting, and eval. yaofu.notion.site/Towards-Comple… Motivation: complex reasoning is the core differentiator of small v.s. large models, while chitchat is simple for both; complex reasoning is the foundation for LLMs to become the next generation operating systems, opening endless opportunities of new app ecosystems