DAIR.AI Profile picture
Democratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://t.co/zQXQt0Pem8
2 subscribers
Aug 3 11 tweets 4 min read
Top AI Papers of The Week (July 28 - August 3):

- GEPA
- Graph-R1
- AlphaEarth
- Self-Evolving Agents
- Hierarchical Reasoning Model
- Efficient Attention Mechanisms
- Geometric-Mean Policy Optimization

Read on for more: 1. AlphaEarth Foundations

AlphaEarth Foundations (AEF) introduces a task-agnostic geospatial foundation model that learns a compact, time-continuous embedding field of Earth’s surface.

Jul 20 11 tweets 4 min read
Top AI Papers of The Week (July 14 - 20):

- Agentic-R1
- Context Rot
- Scaling up RL
- A Survey of AIOps
- Chain-of-Thought Monitorability
- One Token to Fool LLM-as-a-Judge
- A Survey of Context Engineering for LLMs

Read on for more: 1. One Token to Fool LLM-as-a-Judge

Investigates the surprising fragility of LLM-based reward models used in Reinforcement Learning with Verifiable Rewards (RLVR).

Jul 13 11 tweets 4 min read
Top AI Papers of The Week (July 7 - 13):

- H-Net
- HIRAG
- Kimi K2
- MemAgent
- Adaptive Branching MCTS
- A Survey on Latent Reasoning
- What Has a Foundation Model Found?

Read on for more: 1. Kimi K2

Moonshot AI introduces Kimi K2, a 1T parameter Mixture-of-Experts model (32B active) optimized not just for knowledge tasks but for agentic capabilities, models that act, not just respond.

Jul 6 11 tweets 4 min read
Top AI Papers of The Week (June 30 - July 6):

- xLSTMAD
- AI4Research
- Deep Research Agents
- SLMs are the Future of Agentic AI
- Chain-of-Thought Is Not Explainability
- Survey on Evaluation of LLM-based Agents

Read on for more: 1. Small Language Models are the Future of Agentic AI

This position paper argues that small language models (SLMs), defined as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications.

Jun 29 11 tweets 4 min read
Here are the top AI Papers of The Week (June 23 - 29):

- MEM1
- AlphaGenome
- Diffusion Steering via RL
- Towards AI Search Paradigm
- AI Agent Communication Protocols
- Ultra-Fast Diffusion-based Language Models

Full list below: 1. Ultra-Fast Diffusion-based Language Models

This paper introduces Mercury, a family of large-scale diffusion-based language models (dLLMs) optimized for ultra-fast inference.

Jun 22 11 tweets 3 min read
Here are the top AI Papers of The Week (June 16-22):

- RAG+
- ALE-Agent
- From Bytes to Ideas
- Agentic Misalignment
- Future of Work with AI Agents
- Eliciting Reasoning with Cognitive Tools
... 1. RAG+

Introduces RAG+, a modular framework that improves traditional RAG systems by explicitly incorporating application-level reasoning into the retrieval and generation pipeline.

Jun 8 11 tweets 4 min read
Here are the top AI Papers of The Week:

- Open Thoughts
- RewardBench 2
- The Illusion of Thinking
- Knowledge or Reasoning
- From Tokens to Thoughts
- Self-Challenging Language Model Agents

Read on for more: 1. The Illusion of Thinking

Investigates the capabilities & limitations of frontier Large Reasoning Models like Claude 3.7, DeepSeek-R1, and OpenAI’s o-series by analyzing their performance on reasoning tasks as a function of problem complexity.

Jun 1 11 tweets 4 min read
From a new lens on RAG to self-improving AI

Here are the top AI papers of the week:

- QwenLong-L1
- Spurios Rewards
- New Lens on RAG
- Self-Improving Agents
- Scalable Agentic Reasoning
- Learn to Reason without External Rewards

Here is the full list: 1. New Lens on RAG Systems

Introduces a new conceptual and empirical framework for analyzing RAG systems through the lens of sufficient context, whether the retrieved content alone enables answering a query.

May 25 11 tweets 4 min read
Here are the top AI Papers of the Week (May 19 - 25):

- ARC-AGI-2
- AdaptThink
- EfficientLLM
- Visual Planning
- The Pitfalls of Reasoning
- Teaching MLLMs to Think with Images

Here is the full list: 1. Visual Planning

Proposes a novel reasoning paradigm that replaces language-based planning with image-based reasoning.

May 4 11 tweets 3 min read
Here are the top AI Papers of the Week (April 28 - May 4):

- Kimi-Audio
- UniversalRAG
- LLM for Engineering
- DeepSeek-Prover-V2
- Phi-4-Mini-Reasoning
- Advances and Challenges in Foundation Agents

Read on for more: 1. Phi-4-Mini-Reasoning

Microsoft released Phi-4-Mini-Reasoning to explore small reasoning language models for math.

Apr 27 11 tweets 4 min read
Here are the top AI Papers of the Week (April 21 - 27):

- UXAgent
- BitNet b1.58 2B4T
- Describe Anything
- General-Reasoner
- Tiny Reasoning Models
- Test-Time Reinforcement Learning

Read on for more: 1. Does RL Incentivize Reasoning in LLMs Beyond the Base Model?

By analyzing models across tasks (math, code, vision), the authors find that RLVR improves sample efficiency but does not expand reasoning capacity beyond the base model.

Apr 6 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 31 - April 6):

- PaperBench
- Command A
- CodeScientist
- MedAgentSim
- Open Deep Search
- Retrieval-Augmented Reasoning Model

Read on for more: 1). PaperBench

OpenAI introduces a new benchmark, PaperBench, to test whether AI agents can replicate cutting-edge machine learning research papers, from scratch.

Mar 30 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 24-30):

- AgentRxiv
- Play2Prompt
- Chain-of-Tools
- Qwen2.5-Omni
- Tracing the Thoughts of LLMs
- Synthetic Data Generation Using LLMs

Read on for more: 1). Tracing the Thoughts of LLMs

Anthropic researchers unveil new interpretability tools for peering inside LLMs, using Claude 3.5 Haiku. Their two new papers show how to trace model internals like circuits, plans, and conceptual thinking in real time.

Mar 23 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 17-23):

- DAPO
- DeepMesh
- Thinking Machines
- A Review of DeepSeek Models
- A Survey on Efficient Reasoning
- Agentic Memory for LLM Agents

Read on for more: 1). A Review of DeepSeek Models

Provides an in-depth review of the cutting-edge techniques behind DeepSeek's open-source LLMs—DeepSeek-V3 and DeepSeek-R1.

arxiv.org/abs/2503.11486
Mar 16 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 10-16):

- Gemma 3
- Search-R1
- Gemini Robotics
- Post Training of LLMs
- Improving Planning of Agents
- Transformers without Normalization

Read on for more: 1). Gemma 3

Gemma 3 is a lightweight open model family (1B–27B parameters) that integrates vision understanding, multilingual coverage, and extended context windows (up to 128K tokens).

Mar 15 7 tweets 2 min read
Many developers are still writing long prompts or post-processing LLM outputs to get consistent results.

Using structured outputs is a better option in many LLM applications.

Let's discuss a few important points: Image What are structured outputs useful for?

Structured outputs enable LLMs to generate responses in a specific schema like JSON, often leveraging data validation tools like Pydantic. This ensures consistent formatting, predictable data structures and type safety.
Mar 9 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 3-9):

- Agentic Reward Modeling
- Fractal Generative Models
- Cognitive Behaviors in LRMs
- Overview of Reasoning LLMs
- Conversational Speech Model
- A Few Tokens Are All You Need

Read on for more: 1). A Few Tokens Are All You Need

Proposes a new approach to boost reasoning in LLMs by only fine-tuning on the first few tokens of generated solutions.

Mar 2 11 tweets 4 min read
Here are the top AI Papers of the Week (Feb 24 - Mar 2):

- GPT-4.5
- PlanGEN
- Protein LLMs
- Chain-of-Draft
- Claude 3.7 Sonnet
- Emergent Misalignment

Read on for more: 1). Claude 3.7 Sonnet

Anthropic releases a system card for its latest hybrid reasoning model, Claude 3.7 Sonnet, detailing safety measures, evaluations, and a new "extended thinking" mode.

Feb 23 12 tweets 4 min read
Here are the top AI Papers of the Week (Feb 10-16):

- AI Co-Scientist
- Open-Reasoner-Zero
- The AI CUDA Engineer
- Native Sparse Attention
- The Danger of Overthinking
- Large Language Diffusion Model

Read on for more: 1). AI Co-Scientist - Google introduces AI co-scientist, a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.

Feb 16 11 tweets 4 min read
Here are the top AI Papers of the Week (Feb 10-16):

- Latent Reasoning
- Large Memory Models
- Brain-to-Text Decoding
- Enhancing Reasoning to Adapt LLMs
- Reinforcement Learning via Self-Play
- Competitive Programming with Large Reasoning Models

Read on for more: 1). Scaling up Test-Time Compute with Latent Reasoning

This work introduces a latent recurrent-depth transformer, a model that scales test-time reasoning without relying on additional token generation.

Feb 9 11 tweets 4 min read
Here are the top AI Papers of the Week (Feb 3-9):

- s1
- OmniHuman-1
- Less Is More for Reasoning
- Advancing Reasoning in LLMs
- Rethinking Mixture-of-Agents
- Chain-of-Associated-Thoughts

Read on for more: 1). s1: Simple test-time scaling

Introduces s1, a method to boost LLM performance by using extra compute at inference (“test-time scaling”). A new decoding trick appends the token “Wait” when the model tries to stop, forcing it to think longer.