Breakthrough AI to solve the world's biggest problems.
› Join us: https://t.co/MjUpZpKPXJ
› Newsletter: https://t.co/k9gGznstwj
Apr 15 • 8 tweets • 3 min read
Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared.
DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵
🔮DataDecide measures how accurately small experiments (1B parameters, 100B tokens, 3 seeds) predict the real ranking of large runs. This helps us make the most cost-effective decisions for our training runs. 💸
Apr 9 • 6 tweets • 2 min read
For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting?
Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦
OLMoTrace connects phrases or even whole sentences in the language model’s output back to verbatim matches in its training data. It does this by searching billions of documents and trillions of tokens in real time and highlighting where it finds compelling matches.
Mar 26 • 6 tweets • 2 min read
Meet Ai2 Paper Finder, an LLM-powered literature search system.
Searching for relevant work is a multi-step process that requires iteration. Paper Finder mimics this workflow — and helps researchers find more papers than ever 🔍
Paper Finder breaks down your query into relevant components, such as searching for papers, following citations, evaluating for relevance, and running follow-up queries based on the results. It then presents not only the papers, but also short summaries of why the paper is relevant to your specific query.
Jan 30 • 4 tweets • 3 min read
Here is Tülu 3 405B 🐫 our open-source post-training model that surpasses the performance of DeepSeek-V3! The last member of the Tülu 3 family demonstrates that our recipe, which includes Reinforcement Learning from Verifiable Rewards (RVLR) scales to 405B - with performance on par with GPT-4o, and surpassing prior open-weight post-trained models of the same size including Llama 3.1
Benchmarking Tülu 3. Interesting finding: Reinforcement Learning from Verifiable Rewards (RLVR) framework improved the MATH performance more significantly at a larger scale, i.e. 405B compared to 70B and 8B, similar to the findings in the DeepSeek-R1 report.
Nov 21, 2024 • 8 tweets • 5 min read
Meet Tülu 3 -- a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms.
We invented new methods for fine-tuning language models with RL and built upon best practices in the community to scale synthetic instruction and preference data.
It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here: blog.allenai.org/olmo-open-lang…
Huge shout out to all our partners including @databricks, @AMD, @LUMIhpc and @KempnerInst for their support in making the OLMo framework possible. We are excited to build a future together where AI is truly open.
Nov 7, 2021 • 4 tweets • 2 min read
How large an emergency fund do I need? Do I have enough time to grab lunch before my next meeting?
We intuitively solve questions like these every day. Renowned physicist Enrico Fermi had a particular knack for it — these questions have become well known as Fermi Problems.
1/N
Solving Fermi Problems requires recursive decomposition, science/commonsense reasoning, abstraction, and creativity. The inherent complexity of these problems makes them an ideal candidate for #AI reasoning.
2/N