Ai2 Profile picture
Ai2
Breakthrough AI to solve the world's biggest problems. › Join us: https://t.co/MjUpZpKPXJ › Newsletter: https://t.co/k9gGznstwj
Nov 20, 2025 10 tweets 4 min read
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵 Image Most models ship as a single opaque snapshot. Olmo 3 opens the model flow – pretraining, mid-training, & post-training – plus data recipes & code so you can see how capabilities are built + customize any stage.
Nov 18, 2025 7 tweets 3 min read
Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚 Complex questions rarely have a straightforward answer. A deep research agent has to decide what to look up, read across many sources, figure out what actually matters, & then explain it clearly with citations—not just spit out a summary.
Nov 4, 2025 8 tweets 4 min read
Introducing OlmoEarth 🌍, state-of-the-art AI foundation models paired with ready-to-use open infrastructure to turn Earth data into clear, up-to-date insights within hours—not years. By applying AI to a planet’s worth of data, OlmoEarth Platform is already empowering communities to act faster & with confidence to secure a sustainable future. 🌲
OlmoEarth delivers intelligence to people on the ground for anything from aiding restoration efforts to protecting natural resources and communities.
Aug 19, 2025 6 tweets 2 min read
📢 New paper from Ai2: Signal & Noise asks a simple question—can language model benchmarks detect a true difference in model performance? 🧵 Image 🧪 After analyzing 30 benchmarks & 465 open-weight models, the verdict is clear: a simple metric, signal-to-noise ratio (SNR), can reveal which benchmarks are actually informative for making decisions between two models.
Apr 15, 2025 8 tweets 3 min read
Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵Plot shows the relationship between compute used to predict a ranking of datasets and how accurately that ranking reflects performance at the target (1B) scale of models pretrained from scratch on those datasets. 🔮DataDecide measures how accurately small experiments (1B parameters, 100B tokens, 3 seeds) predict the real ranking of large runs. This helps us make the most cost-effective decisions for our training runs. 💸
Apr 9, 2025 6 tweets 2 min read
For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?

Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦 OLMoTrace connects phrases or even whole sentences in the language model’s output back to verbatim matches in its training data. It does this by searching billions of documents and trillions of tokens in real time and highlighting where it finds compelling matches. Screenshot of how OLMoTrace works in the Ai2 Playground interface. Prompt asks, "Who is Celine Dion?" The model outputs four paragraphs with several text spans highlighted. On the right, the OLMoTrace panel displays several documents from the training data that match the highlighted text spans in the model output.
Mar 26, 2025 6 tweets 2 min read
Meet Ai2 Paper Finder, an LLM-powered literature search system.

Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍 Screenshot of the Ai2 Paper Finder interface Paper Finder breaks down your query into relevant components, such as searching for papers, following citations, evaluating for relevance, and running follow-up queries based on the results. It then presents not only the papers, but also short summaries of why the paper is relevant to your specific query.Part of Ai2 Paper Finders reasoning process while it searches a query
Jan 30, 2025 4 tweets 3 min read
Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on par with GPT-4o, and surpassing prior open-weight post-trained models of the same size including Llama 3.1The logo for Tülu 405B. Benchmarking Tülu 3. Interesting finding: Reinforcement Learning from Verifiable Rewards (RLVR) framework improved the MATH performance more significantly at a larger scale, i.e. 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report.A table mapping Tülu 3 405B performance compared to other current models across several evaluation benchmarks.
Nov 21, 2024 8 tweets 5 min read
Meet Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.

We invented new methods for fine-tuning language models with RL and built upon best practices in the community to scale synthetic instruction and preference data.

Demo, GitHub, technical report, and models below 👇Image 8B model: huggingface.co/allenai/Llama-…
70B model: huggingface.co/allenai/Llama-…
Try it out: playground.allenai.org
Learn more: allenai.org/tulu
First, try our models via our free demo, or grab them on Hugging Face. Image
Feb 1, 2024 4 tweets 2 min read
OLMo is here! And it’s 100% open.

It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here:
blog.allenai.org/olmo-open-lang…
Huge shout out to all our partners including @databricks, @AMD, @LUMIhpc and @KempnerInst for their support in making the OLMo framework possible. We are excited to build a future together where AI is truly open.
Nov 7, 2021 4 tweets 2 min read
How large an emergency fund do I need? Do I have enough time to grab lunch before my next meeting?
We intuitively solve questions like these every day. Renowned physicist Enrico Fermi had a particular knack for it — these questions have become well known as Fermi Problems.
1/N
Solving Fermi Problems requires recursive decomposition, science/commonsense reasoning, abstraction, and creativity. The inherent complexity of these problems makes them an ideal candidate for #AI reasoning.
2/N