Latest Twitter Threads by @dair_ai on Thread Reader App

Aug 3 • 11 tweets • 4 min read

Top AI Papers of The Week (July 28 - August 3):

- GEPA
- Graph-R1
- AlphaEarth
- Self-Evolving Agents
- Hierarchical Reasoning Model
- Efficient Attention Mechanisms
- Geometric-Mean Policy Optimization

Read on for more: 1. AlphaEarth Foundations

AlphaEarth Foundations (AEF) introduces a task-agnostic geospatial foundation model that learns a compact, time-continuous embedding field of Earth’s surface.

https://x.com/GoogleDeepMind/status/1950563700286398965

Jul 20 • 11 tweets • 4 min read

Top AI Papers of The Week (July 14 - 20):

- Agentic-R1
- Context Rot
- Scaling up RL
- A Survey of AIOps
- Chain-of-Thought Monitorability
- One Token to Fool LLM-as-a-Judge
- A Survey of Context Engineering for LLMs

Read on for more: 1. One Token to Fool LLM-as-a-Judge

Investigates the surprising fragility of LLM-based reward models used in Reinforcement Learning with Verifiable Rewards (RLVR).

https://x.com/omarsar0/status/1944778174493343771

Jul 13 • 11 tweets • 4 min read

Top AI Papers of The Week (July 7 - 13):

- H-Net
- HIRAG
- Kimi K2
- MemAgent
- Adaptive Branching MCTS
- A Survey on Latent Reasoning
- What Has a Foundation Model Found?

Read on for more: 1. Kimi K2

Moonshot AI introduces Kimi K2, a 1T parameter Mixture-of-Experts model (32B active) optimized not just for knowledge tasks but for agentic capabilities, models that act, not just respond.

https://x.com/Kimi_Moonshot/status/1943687594560332025

Jul 6 • 11 tweets • 4 min read

Top AI Papers of The Week (June 30 - July 6):

- xLSTMAD
- AI4Research
- Deep Research Agents
- SLMs are the Future of Agentic AI
- Chain-of-Thought Is Not Explainability
- Survey on Evaluation of LLM-based Agents

Read on for more: 1. Small Language Models are the Future of Agentic AI

This position paper argues that small language models (SLMs), defined as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications.

https://x.com/omarsar0/status/1940038438746718698

Jun 29 • 11 tweets • 4 min read

Here are the top AI Papers of The Week (June 23 - 29):

- MEM1
- AlphaGenome
- Diffusion Steering via RL
- Towards AI Search Paradigm
- AI Agent Communication Protocols
- Ultra-Fast Diffusion-based Language Models

Full list below: 1. Ultra-Fast Diffusion-based Language Models

This paper introduces Mercury, a family of large-scale diffusion-based language models (dLLMs) optimized for ultra-fast inference.

https://x.com/omarsar0/status/1937600372430045494

Jun 22 • 11 tweets • 3 min read

Here are the top AI Papers of The Week (June 16-22):

- RAG+
- ALE-Agent
- From Bytes to Ideas
- Agentic Misalignment
- Future of Work with AI Agents
- Eliciting Reasoning with Cognitive Tools
... 1. RAG+

Introduces RAG+, a modular framework that improves traditional RAG systems by explicitly incorporating application-level reasoning into the retrieval and generation pipeline.

https://x.com/omarsar0/status/1934667096828399641

Jun 8 • 11 tweets • 4 min read

Here are the top AI Papers of The Week:

- Open Thoughts
- RewardBench 2
- The Illusion of Thinking
- Knowledge or Reasoning
- From Tokens to Thoughts
- Self-Challenging Language Model Agents

Read on for more: 1. The Illusion of Thinking

Investigates the capabilities & limitations of frontier Large Reasoning Models like Claude 3.7, DeepSeek-R1, and OpenAI’s o-series by analyzing their performance on reasoning tasks as a function of problem complexity.

https://x.com/omarsar0/status/1931333830985883888

Jun 1 • 11 tweets • 4 min read

From a new lens on RAG to self-improving AI

Here are the top AI papers of the week:

- QwenLong-L1
- Spurios Rewards
- New Lens on RAG
- Self-Improving Agents
- Scalable Agentic Reasoning
- Learn to Reason without External Rewards

Here is the full list: 1. New Lens on RAG Systems

Introduces a new conceptual and empirical framework for analyzing RAG systems through the lens of sufficient context, whether the retrieved content alone enables answering a query.

https://x.com/omarsar0/status/1927737131478188295

May 25 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (May 19 - 25):

- ARC-AGI-2
- AdaptThink
- EfficientLLM
- Visual Planning
- The Pitfalls of Reasoning
- Teaching MLLMs to Think with Images

Here is the full list: 1. Visual Planning

Proposes a novel reasoning paradigm that replaces language-based planning with image-based reasoning.

https://x.com/_yixu/status/1924497238908375072

May 4 • 11 tweets • 3 min read

Here are the top AI Papers of the Week (April 28 - May 4):

- Kimi-Audio
- UniversalRAG
- LLM for Engineering
- DeepSeek-Prover-V2
- Phi-4-Mini-Reasoning
- Advances and Challenges in Foundation Agents

Read on for more: 1. Phi-4-Mini-Reasoning

Microsoft released Phi-4-Mini-Reasoning to explore small reasoning language models for math.

https://x.com/omarsar0/status/1917954418173247909

Apr 27 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (April 21 - 27):

- UXAgent
- BitNet b1.58 2B4T
- Describe Anything
- General-Reasoner
- Tiny Reasoning Models
- Test-Time Reinforcement Learning

Read on for more: 1. Does RL Incentivize Reasoning in LLMs Beyond the Base Model?

By analyzing models across tasks (math, code, vision), the authors find that RLVR improves sample efficiency but does not expand reasoning capacity beyond the base model.

https://x.com/DaveShapi/status/1915408405201629684

Apr 6 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Mar 31 - April 6):

- PaperBench
- Command A
- CodeScientist
- MedAgentSim
- Open Deep Search
- Retrieval-Augmented Reasoning Model

Read on for more: 1). PaperBench

OpenAI introduces a new benchmark, PaperBench, to test whether AI agents can replicate cutting-edge machine learning research papers, from scratch.

https://x.com/OpenAI/status/1907481490457506235

Mar 30 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Mar 24-30):

- AgentRxiv
- Play2Prompt
- Chain-of-Tools
- Qwen2.5-Omni
- Tracing the Thoughts of LLMs
- Synthetic Data Generation Using LLMs

Read on for more: 1). Tracing the Thoughts of LLMs

Anthropic researchers unveil new interpretability tools for peering inside LLMs, using Claude 3.5 Haiku. Their two new papers show how to trace model internals like circuits, plans, and conceptual thinking in real time.

https://x.com/AnthropicAI/status/1905303835892990278

Mar 23 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Mar 17-23):

- DAPO
- DeepMesh
- Thinking Machines
- A Review of DeepSeek Models
- A Survey on Efficient Reasoning
- Agentic Memory for LLM Agents

Read on for more: 1). A Review of DeepSeek Models

Provides an in-depth review of the cutting-edge techniques behind DeepSeek's open-source LLMs—DeepSeek-V3 and DeepSeek-R1.

arxiv.org/abs/2503.11486

Mar 16 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Mar 10-16):

- Gemma 3
- Search-R1
- Gemini Robotics
- Post Training of LLMs
- Improving Planning of Agents
- Transformers without Normalization

Read on for more: 1). Gemma 3

Gemma 3 is a lightweight open model family (1B–27B parameters) that integrates vision understanding, multilingual coverage, and extended context windows (up to 128K tokens).

https://x.com/omarsar0/status/1899828483888762948

Mar 15 • 7 tweets • 2 min read

Many developers are still writing long prompts or post-processing LLM outputs to get consistent results.

Using structured outputs is a better option in many LLM applications.

Let's discuss a few important points:

What are structured outputs useful for?

Structured outputs enable LLMs to generate responses in a specific schema like JSON, often leveraging data validation tools like Pydantic. This ensures consistent formatting, predictable data structures and type safety.

Mar 9 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Mar 3-9):

- Agentic Reward Modeling
- Fractal Generative Models
- Cognitive Behaviors in LRMs
- Overview of Reasoning LLMs
- Conversational Speech Model
- A Few Tokens Are All You Need

Read on for more: 1). A Few Tokens Are All You Need

Proposes a new approach to boost reasoning in LLMs by only fine-tuning on the first few tokens of generated solutions.

https://x.com/omarsar0/status/1897334301462815001

Mar 2 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Feb 24 - Mar 2):

- GPT-4.5
- PlanGEN
- Protein LLMs
- Chain-of-Draft
- Claude 3.7 Sonnet
- Emergent Misalignment

Read on for more: 1). Claude 3.7 Sonnet

Anthropic releases a system card for its latest hybrid reasoning model, Claude 3.7 Sonnet, detailing safety measures, evaluations, and a new "extended thinking" mode.

https://x.com/AnthropicAI/status/1894092430560965029

Feb 23 • 12 tweets • 4 min read

Here are the top AI Papers of the Week (Feb 10-16):

- AI Co-Scientist
- Open-Reasoner-Zero
- The AI CUDA Engineer
- Native Sparse Attention
- The Danger of Overthinking
- Large Language Diffusion Model

Read on for more: 1). AI Co-Scientist - Google introduces AI co-scientist, a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.

https://x.com/omarsar0/status/1892223515660579219

Feb 16 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Feb 10-16):

- Latent Reasoning
- Large Memory Models
- Brain-to-Text Decoding
- Enhancing Reasoning to Adapt LLMs
- Reinforcement Learning via Self-Play
- Competitive Programming with Large Reasoning Models

Read on for more: 1). Scaling up Test-Time Compute with Latent Reasoning

This work introduces a latent recurrent-depth transformer, a model that scales test-time reasoning without relying on additional token generation.

https://x.com/omarsar0/status/1890506648772571452

Feb 9 • 11 tweets • 4 min read

Here are the top AI Papers of the Week (Feb 3-9):

- s1
- OmniHuman-1
- Less Is More for Reasoning
- Advancing Reasoning in LLMs
- Rethinking Mixture-of-Agents
- Chain-of-Associated-Thoughts

Read on for more: 1). s1: Simple test-time scaling

Introduces s1, a method to boost LLM performance by using extra compute at inference (“test-time scaling”). A new decoding trick appends the token “Wait” when the model tries to stop, forcing it to think longer.

https://x.com/omarsar0/status/1886428631041225030

Share this page!

Enter URL or ID to Unroll