Latest Twitter Threads by @allen_ai on Thread Reader App

Aug 19 • 6 tweets • 2 min read

📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵

🧪 After analyzing 30 benchmarks & 465 open-weight models, the verdict is clear: a simple metric, signal-to-noise ratio (SNR), can reveal which benchmarks are actually informative for making decisions between two models.

Apr 15 • 8 tweets • 3 min read

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

🔮DataDecide measures how accurately small experiments (1B parameters, 100B tokens, 3 seeds) predict the real ranking of large runs. This helps us make the most cost-effective decisions for our training runs. 💸

Apr 9 • 6 tweets • 2 min read

For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?

Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦

OLMoTrace connects phrases or even whole sentences in the language model’s output back to verbatim matches in its training data. It does this by searching billions of documents and trillions of tokens in real time and highlighting where it finds compelling matches.

Mar 26 • 6 tweets • 2 min read

Meet Ai2 Paper Finder, an LLM-powered literature search system.

Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍

Paper Finder breaks down your query into relevant components, such as searching for papers, following citations, evaluating for relevance, and running follow-up queries based on the results. It then presents not only the papers, but also short summaries of why the paper is relevant to your specific query.

Jan 30 • 4 tweets • 3 min read

Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on par with GPT-4o, and surpassing prior open-weight post-trained models of the same size including Llama 3.1

Benchmarking Tülu 3. Interesting finding: Reinforcement Learning from Verifiable Rewards (RLVR) framework improved the MATH performance more significantly at a larger scale, i.e. 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report.

Nov 21, 2024 • 8 tweets • 5 min read

Meet Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.

We invented new methods for fine-tuning language models with RL and built upon best practices in the community to scale synthetic instruction and preference data.

Demo, GitHub, technical report, and models below 👇

8B model: huggingface.co/allenai/Llama-…
70B model: huggingface.co/allenai/Llama-…
Try it out: playground.allenai.org
Learn more: allenai.org/tulu
First, try our models via our free demo, or grab them on Hugging Face.

Feb 1, 2024 • 4 tweets • 2 min read

OLMo is here! And it’s 100% open.

It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here:
blog.allenai.org/olmo-open-lang…

Huge shout out to all our partners including @databricks, @AMD, @LUMIhpc and @KempnerInst for their support in making the OLMo framework possible. We are excited to build a future together where AI is truly open.

Nov 7, 2021 • 4 tweets • 2 min read

How large an emergency fund do I need? Do I have enough time to grab lunch before my next meeting?
We intuitively solve questions like these every day. Renowned physicist Enrico Fermi had a particular knack for it — these questions have become well known as Fermi Problems.
1/N Solving Fermi Problems requires recursive decomposition, science/commonsense reasoning, abstraction, and creativity. The inherent complexity of these problems makes them an ideal candidate for #AI reasoning.
2/N

Share this page!

Enter URL or ID to Unroll