DAIR.AI Profile picture
Democratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://t.co/zQXQt0Pem8
2 subscribers
Mar 30 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 24-30):

- AgentRxiv
- Play2Prompt
- Chain-of-Tools
- Qwen2.5-Omni
- Tracing the Thoughts of LLMs
- Synthetic Data Generation Using LLMs

Read on for more: 1). Tracing the Thoughts of LLMs

Anthropic researchers unveil new interpretability tools for peering inside LLMs, using Claude 3.5 Haiku. Their two new papers show how to trace model internals like circuits, plans, and conceptual thinking in real time.

Mar 23 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 17-23):

- DAPO
- DeepMesh
- Thinking Machines
- A Review of DeepSeek Models
- A Survey on Efficient Reasoning
- Agentic Memory for LLM Agents

Read on for more: 1). A Review of DeepSeek Models

Provides an in-depth review of the cutting-edge techniques behind DeepSeek's open-source LLMs—DeepSeek-V3 and DeepSeek-R1.

arxiv.org/abs/2503.11486
Mar 16 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 10-16):

- Gemma 3
- Search-R1
- Gemini Robotics
- Post Training of LLMs
- Improving Planning of Agents
- Transformers without Normalization

Read on for more: 1). Gemma 3

Gemma 3 is a lightweight open model family (1B–27B parameters) that integrates vision understanding, multilingual coverage, and extended context windows (up to 128K tokens).

Mar 15 7 tweets 2 min read
Many developers are still writing long prompts or post-processing LLM outputs to get consistent results.

Using structured outputs is a better option in many LLM applications.

Let's discuss a few important points: Image What are structured outputs useful for?

Structured outputs enable LLMs to generate responses in a specific schema like JSON, often leveraging data validation tools like Pydantic. This ensures consistent formatting, predictable data structures and type safety.
Mar 9 11 tweets 4 min read
Here are the top AI Papers of the Week (Mar 3-9):

- Agentic Reward Modeling
- Fractal Generative Models
- Cognitive Behaviors in LRMs
- Overview of Reasoning LLMs
- Conversational Speech Model
- A Few Tokens Are All You Need

Read on for more: 1). A Few Tokens Are All You Need

Proposes a new approach to boost reasoning in LLMs by only fine-tuning on the first few tokens of generated solutions.

Mar 2 11 tweets 4 min read
Here are the top AI Papers of the Week (Feb 24 - Mar 2):

- GPT-4.5
- PlanGEN
- Protein LLMs
- Chain-of-Draft
- Claude 3.7 Sonnet
- Emergent Misalignment

Read on for more: 1). Claude 3.7 Sonnet

Anthropic releases a system card for its latest hybrid reasoning model, Claude 3.7 Sonnet, detailing safety measures, evaluations, and a new "extended thinking" mode.

Feb 23 12 tweets 4 min read
Here are the top AI Papers of the Week (Feb 10-16):

- AI Co-Scientist
- Open-Reasoner-Zero
- The AI CUDA Engineer
- Native Sparse Attention
- The Danger of Overthinking
- Large Language Diffusion Model

Read on for more: 1). AI Co-Scientist - Google introduces AI co-scientist, a multi-agent AI system built with Gemini 2.0 to help accelerate scientific breakthroughs.

Feb 16 11 tweets 4 min read
Here are the top AI Papers of the Week (Feb 10-16):

- Latent Reasoning
- Large Memory Models
- Brain-to-Text Decoding
- Enhancing Reasoning to Adapt LLMs
- Reinforcement Learning via Self-Play
- Competitive Programming with Large Reasoning Models

Read on for more: 1). Scaling up Test-Time Compute with Latent Reasoning

This work introduces a latent recurrent-depth transformer, a model that scales test-time reasoning without relying on additional token generation.

Feb 9 11 tweets 4 min read
Here are the top AI Papers of the Week (Feb 3-9):

- s1
- OmniHuman-1
- Less Is More for Reasoning
- Advancing Reasoning in LLMs
- Rethinking Mixture-of-Agents
- Chain-of-Associated-Thoughts

Read on for more: 1). s1: Simple test-time scaling

Introduces s1, a method to boost LLM performance by using extra compute at inference (“test-time scaling”). A new decoding trick appends the token “Wait” when the model tries to stop, forcing it to think longer.

Feb 2 11 tweets 4 min read
Here are the top AI Papers of the Week (Jan 27 - Feb 2):

- o3-mini
- Janus-Pro
- Qwen2.5-1M
- Diverse Preference Optimization
- Improving RAG through Multi-Agent RL
- Usage Recommendation for DeepSeek-R1

Read on for more: 1). o3-mini

o3-mini is a new cost-efficient reasoning model, available in ChatGPT and API. The model excels in STEM-related tasks, particularly in science, math, and coding, while maintaining the low cost and reduced latency of its predecessor o1-mini.

Jan 26 11 tweets 4 min read
Here are the top AI Papers of the Week (Jan 20-26):

- DeepSeek-R1
- Can LLMs Plan?
- Chain-of-Agents
- Scaling RL with LLMs
- Humanity’s Last Exam
- Agentic RAG Overview

Read on for more: 1). DeepSeek-R1

Represents an advancement in reasoning capabilities achieved through RL. It involves two key models: DeepSeek-R1-Zero, which uses pure RL without supervised fine-tuning, and DeepSeek-R1, which combines RL with cold-start data.

Jan 19 11 tweets 4 min read
Here are the top AI Papers of the Week (Jan 13-19):

- VideoRAG
- MiniMax-01
- Enhancing RAG
- Self-Adaptive LLMs
- Foundations of LLMs
- Learning to Memorize at Test Time

Read on for more: 1). Self-Adaptive LLMs - introduces Transformer^2, a novel self-adaptation framework that adapts LLMs for unseen tasks in real-time by selectively adjusting singular components of their weight matrices...

Dec 29, 2024 11 tweets 4 min read
Here are the top ML Papers of the Week (Dec 16-22):

- DRT-o1
- LearnLM
- DeepSeek-V3
- Large Concept Models
- Explore Theory-of-Mind
- Reinforcement Learning Overview

Read on for more: 1). DeepSeek-V3 - a 671B-parameter MoE language model that activates 37B parameters per token, utilizing MLA and DeepSeekMoE architectures for efficient operation

Dec 22, 2024 11 tweets 4 min read
Here are the top ML Papers of the Week (Dec 16-22):

- Genesis
- AutoFeedback
- TheAgentCompany
- Alignment Faking in LLMs
- Qwen-2.5 Technical Report
- Precise Length Control in LLMs

Read on for more: 1). Genesis - a new universal physics simulation platform that combines a high-performance physics engine with generative AI capabilities...

Aug 4, 2024 11 tweets 4 min read
The Top ML Papers of the Week (July 29 - August 4):

- MindSearch
- Refusal in LLMs
- Constrained-CoT
- Meta-Rewarding LLMs
- Evaluating Persona Agents
- Improved RAG with Self-Reasoning
... 1/ Meta-Rewarding LLMs - proposes a self-improving alignment technique (no human supervision) where the LLM judges its own judgements and uses the feedback to improve its judgment skills...

May 26, 2024 11 tweets 4 min read
The Top ML Papers of the Week (May 20 - May 26):

- Guide for Evaluating LLMs
- Efficient Multimodal LLMs
- Scientific Applications of LLMs
- Enhancing Answer Selection in LLMs
- Claude 3 Sonnet Interpretable Features
- Agent Planning with World Knowledge Model
... 1/ Extracting Interpretable Features from Claude 3 - presents an effective method to extract millions of abstract features from an LLM that represent specific concepts; these concepts could represent people, places, programming abstractions, and more...

Mar 31, 2024 11 tweets 4 min read
The Top ML Papers of the Week (March 25 - March 31):

- DBRX
- Grok-1.5
- LLM2LLM
- Mini-Gemini
- Agent Lumos
- Long-form factuality in LLMs
... 1). DBRX - a new 132B parameter open LLM that outperforms all the established open-source models on common benchmarks like MMLU and GSM8K; DBRX was pretrained on 12T tokens (text and code) and uses a mixture-of-experts (MoE) architecture.

Feb 25, 2024 11 tweets 4 min read
The Top ML Papers of the Week (Feb 19 - Feb 25):

- LoRA+
- Gemma
- Stable Diffusion 3
- OpenCodeInterpreter
- Revisiting REINFORCE in RLHF
- CoT Reasoning without Prompting
... 1/ Stable Diffusion 3 - a suite of image generation models ranging from 800M to 8B parameters; combines diffusion transformer architecture and flow matching for improved performance in multi-subject prompts, image quality, and spelling abilities.

Nov 19, 2023 11 tweets 4 min read
Top ML Papers of the Week (Nov 13 - Nov 19):

- JARVIS-1
- Chain-of-Note
- Contrastive CoT Prompting
- LLMs for Scientific Discovery
- Learning to Filter Context for RAG
- A Survey on Language Models for Code
... 1/ Emu Video and Emu Edit - present powerful new models for controlled image editing and text-to-video generation based on diffusion models.

Oct 1, 2023 11 tweets 4 min read
Top ML Papers of the Week (Sep 25 - Oct 1):

- MentalLlaMa
- Boolformer
- The Reversal Curse in LLMs
- Long-Context Scaling with LLMs
- Graph Neural Prompting with LLMs
- Vision Transformers Need Registers
... 1/ The Reversal Curse - finds that LLMs trained on sentences of the form “A is B” will not automatically generalize to the reverse direction “B is A”; shows the effect across model sizes and model families.

x.com/OwainEvans_UK/…
Sep 3, 2023 11 tweets 4 min read
Top ML Papers of the Week (Aug 28 - Sep 3):

- AnomalyGPT
- SAM-Med2D
- Graph of Thoughts
- Factuality Detection in LLMs
- Large Language and Speech Model
- Vector Search with OpenAI Embeddings
... Image 1/ Large Language and Speech Model - proposes a large language and speech model trained with cross-modal conversational abilities that supports speech-and-language instruction enabling more natural interactions with AI systems.