MemAgent-14B is trained on 32K-length documents with an 8K context window.
Achieves >76% accuracy even at 3.5M tokens!
That consistency is crazy!
Here are my notes:
Overview
Introduces an RL–driven memory agent that enables transformer-based LLMs to handle documents up to 3.5 million tokens with near lossless performance, linear complexity, and no architectural modifications.
RL-shaped fixed-length memory
MemAgent reads documents in segments and maintains a fixed-size memory updated via an overwrite mechanism.
This lets it process arbitrarily long inputs with O(N) inference cost while avoiding context window overflows.
This is a really good example of integrating agentic reasoning into RAG.
Leads to better personalization and improved recommendations.
Here are my notes:
Overview
This work introduces a multi-agent framework, ARAG, that enhances traditional RAG systems with reasoning agents tailored to user modeling and contextual ranking.
It reframes recommendation as a structured coordination problem between LLM agents.
Instead of relying on static similarity-based retrieval, ARAG comprises four agents:
- User Understanding Agent synthesizes user preferences from long-term and session behavior.
- NLI Agent evaluates semantic alignment between candidate items and user intent.
AI for Science is where I spend most of my time exploring with AI agents.
This 120+ pages report does a good job of highlighting why all the big names like OpenAI and Google DeepMind are pursuing AI4Science.
Bookmark it!
My notes below:
There are five key areas:
(1) AI for Scientific Comprehension (2) AI for Academic Survey (3) AI for Scientific Discovery (4) AI for Academic Writing (5) AI for Academic Peer Review
Just look at the large body of work that's been happening in the space:
Small Language Models are the Future of Agentic AI
Lots to gain from building agentic systems with small language models.
Capabilities are increasing rapidly!
AI devs should be exploring SLMs.
Here are my notes:
Overview
This position paper argues that small language models (SLMs), defined pragmatically as those runnable on consumer-grade hardware, are not only sufficient but superior for many agentic AI applications, especially when tasks are narrow, repetitive, or tool-oriented.
The authors propose that shifting from LLM-first to SLM-first architectures will yield major gains in efficiency, modularity, and sustainability.
> throughputs of 1109 tokens/sec and 737 tokens/sec
> outperforms speed-optimized frontier models by up to 10× on average
Diffusion LLMs are early, but could be huge.
More in my notes below:
✦ Overview
This paper introduces Mercury, a family of large-scale diffusion-based language models (dLLMs) optimized for ultra-fast inference.
Unlike standard autoregressive LLMs, Mercury models generate multiple tokens in parallel via a coarse-to-fine refinement process.
✦ Achieves higher throughput without sacrificing output quality
The release focuses on code generation, with Mercury Coder Mini and Small models achieving up to 1109 and 737 tokens/sec, respectively, on NVIDIA H100s.
Outperforms speed-optimized frontier models by up to 10×.