Latest Twitter Threads by @Francis_YAO_ on Thread Reader App

Apr 25, 2024 • 7 tweets • 3 min read

From Claude100K to Gemini10M, we are in the era of long context language models. Why and how a language model can utilize information at any input locations within long context? We discover retrieval heads, a special type of attention head responsible for long-context factuality

Retrieval heads are universal and sparse: all transformer language models we study, either base or chat, short or long, small or large, dense or MoE -- as long as they pass needle in a haystack, they have a small set of retrieval heads

Jun 8, 2023 • 4 tweets • 2 min read

Is Falcon really better than LLaMA?
Short take: probably not.

Longer take: we reproduced LLaMA 65B eval on MMLU and we got 61.4, close to the official number (63.4), much higher than its Open LLM Leaderboard number (48.8), and clearly higher than Falcon (52.7).

Code and prompt… twitter.com/i/web/status/1… Update:
@BlancheMinerva pointed out that a fair comparison is to also run Falcon on MMLU with the default settings. She is right, and we are running this right now, likely to get the results in a day or so.
So stay tuned, and see how the numbers compares.

Again, my personal… twitter.com/i/web/status/1…

May 18, 2023 • 7 tweets • 4 min read

Introducing GPT-Bargaining: Claude + Cohere + AI21 v.s. GPT

Who could win a better deal 💵 ?

We ask 2 LLMs bargain about a 🎈, then a third LLM provide AI feedback, then improve from AI feedback, and see if LLMs could continuously autonomously evolve

arxiv.org/abs/2305.10142

We are very much surprised 😮 by the superhuman — or at least better than me 😅 bargaining techniques suggested by the AI critic, and how the model improve from these feedback 🎉

May 3, 2023 • 7 tweets • 3 min read

The core differences between GPT4 and 3.5 is the ability to perform complex tasks. In this post, we present a complete roadmap towards LLMs complex reasoning abilities, covering the full development stages: pretraining, SFT, RL, CoT prompting, and eval. yaofu.notion.site/Towards-Comple… Motivation: complex reasoning is the core differentiator of small v.s. large models, while chitchat is simple for both; complex reasoning is the foundation for LLMs to become the next generation operating systems, opening endless opportunities of new app ecosystems

Share this page!

Enter URL or ID to Unroll