Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

elvis

@omarsar0

Aug 31 • 10 tweets • 4 min read • Read on X

Scrolly

Overview of Self-Evolving Agents

There is a huge interest in moving from hand-crafted agentic systems to lifelong, adaptive agentic ecosystems.

What's the progress, and where are things headed?

Let's find out:

This survey defines self-evolving AI agents and argues for a shift from static, hand-crafted systems to lifelong, adaptive agentic ecosystems.

It maps the field’s trajectory, proposes “Three Laws” to keep evolution safe and useful, and organizes techniques across single-agent, multi-agent, and domain-specific settings.

Paradigm shift and guardrails

The paper frames four stages: Model Offline Pretraining → Model Online Adaptation → Multi-Agent Orchestration → Multi-Agent Self-Evolving.

It introduces three guiding laws for evolution: maintain safety, preserve or improve performance, and then autonomously optimize.

LLM-centric learning paradigms:

MOP (Model Offline Pretraining): Static pretraining on large corpora; no adaptation after deployment.

MOA (Model Online Adaptation): Post-deployment updates via fine-tuning, adapters, or RLHF.

MAO (Multi-Agent Orchestration): Multiple agents coordinate through message exchange or debate, without changing model weights.

MASE (Multi-Agent Self-Evolving): Agents interact with their environment, continually optimising prompts, memory, tools, and workflows.

The Evolution Landscape of AI Agents

The paper presents a visual taxonomy of AI agent evolution and optimisation techniques, categorised into three major directions:
single-agent optimisation, multi-agent optimisation, and domain-specific optimisation.

Unified framework for evolution

A single iterative loop connects System Inputs, Agent System, Environment feedback, and Optimizer.

Optimizers search over prompts, tools, memory, model parameters, and even agent topologies using heuristics, search, or learning.

Single-agent optimization toolbox

Techniques are grouped into:

(i) LLM behavior (training for reasoning; test-time scaling with search and verification),

(ii) prompt optimization (edit, generate, text-gradient, evolutionary),

(iii) memory optimization (short-term compression and retrieval; long-term RAG, graphs, and control policies), and

(iv) tool use and tool creation.

Agentic Self-Evolution methods

The authors present a comprehensive hierarchical categorization of agentic self-evolution methods, including single-agent, multi-agent, and domain-specific optimization categories.

Multi-agent workflows that self-improve

Beyond manual pipelines, the survey treats prompts, topologies, and backbones as searchable spaces.

It distinguishes code-level workflows and communication-graph topologies, covers unified optimization that jointly tunes prompts and structure, and describes backbone training for better cooperation.

Evaluation, safety, and open problems

Benchmarks span tools, web navigation, GUI agents, collaboration, and specialized domains; LLM-as-judge and Agent-as-judge reduce evaluation cost while tracking process quality.

The paper stresses continuous, evolution-aware safety monitoring and highlights challenges such as stable reward modeling, efficiency-effectiveness trade-offs, and transfer of optimized prompts/topologies to new models or domains.

Paper: arxiv.org/abs/2508.07407

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @omarsar0

elvis

@omarsar0

Sep 5

Cool research from Microsoft!

They release rStar2-Agent, a 14B math reasoning models trained with agentic RL.

It reaches frontier-level math reasoning in just 510 RL training steps.

Here are my notes:

Quick Overview

rStar2-Agent (Microsoft Research). A 14B math-reasoning model trained with agentic RL that learns to think smarter by using a Python tool environment, not just longer CoT.

It introduces GRPO-RoC, a rollout strategy that filters noisy successful traces, plus infrastructure for massive, low-latency tool execution.

Method

GRPO-RoC oversamples rollouts, then keeps only the cleanest correct ones while preserving diverse failures, reducing tool-call errors and formatting issues during training.

Read 7 tweets

elvis

@omarsar0

Aug 28

Memory-R1

Another really cool paper showing how RL can enhance an LLM's agentic and memory capabilities.

Great read for AI devs.

Here are my notes:

Overview

A framework that teaches LLM agents to decide what to remember and how to use it.

Two RL-fine-tuned components work together: a Memory Manager that learns CRUD-style operations on an external store and an Answer Agent that filters retrieved memories via “memory distillation” before answering.

Active memory control with RL

The Memory Manager selects ADD, UPDATE, DELETE, or NOOP after a RAG step and edits entries accordingly; training with PPO or GRPO uses downstream QA correctness as the reward, removing the need for per-edit labels.

Read 7 tweets

elvis

@omarsar0

Aug 27

Don't sleep on small models!

Anemoi is the latest multi-agent system that proves small models pack a punch when combined effectively.

GPT-4.1-mini (for planning) and GPT-4o (for worker agents) surpass the strongest open-source baseline on GAIA.

A must-read for devs:

Quick Overview

Anemoi is a semi-centralized generalist multi-agent system powered by an A2A communication MCP server from @Coral_Protocol.

Anemoi replaces purely centralized, context-stuffed coordination with an A2A communication server (MCP) that lets agents talk directly, monitor progress, refine plans, and reach consensus.

Design

A semi-centralized planner proposes an initial plan, while worker agents (web, document processing, reasoning/coding) plus critique and answer-finding agents collaborate via MCP threads.

Agents communicate directly with each other.

All participants can list agents, create threads, send messages, wait for mentions, and update plans as execution unfolds.

Read 8 tweets

elvis

@omarsar0

Aug 27

Efficient Language Model with PostNAS

NVIDIA's recent research on LLMs has been fantastic.

Jet-Nemotron is the latest in efficient language models, which significantly improves generation throughput.

Here are my notes:

A hybrid-architecture LM family built by “adapting after pretraining.”

Starting from a frozen full-attention model, the authors search where to keep full attention, which linear-attention block to use, and which hyperparameters match hardware limits.

The result, Jet-Nemotron-2B/4B, matches or surpasses popular full-attention baselines while massively increasing throughput on long contexts.

PostNAS pipeline

Begins with a pre-trained full-attention model and freezes MLPs, then proceeds in four steps:

1. Learn optimal placement or removal of full-attention layers
2. Select a linear-attention block
3. Design a new attention block
4. Run a hardware-aware hyperparameter search

Read 7 tweets

elvis

@omarsar0

Aug 25

Fine-tuning LLM Agents without Fine-tuning LLMs

Catchy title and very cool memory technique to improve deep research agents.

Great for continuous, real-time learning without gradient updates.

Here are my notes:

Overview

Proposes a memory‑based learning framework that lets deep‑research agents adapt online without updating model weights.

The agent is cast as a memory‑augmented MDP with case‑based reasoning, implemented in a planner–executor loop over MCP tools.

Method

Decisions are guided by a learned case‑retrieval policy over an episodic Case Bank.

Non‑parametric memory retrieves Top‑K similar cases; parametric memory learns a Q‑function (soft Q‑learning or single‑step CE training in deep‑research settings) to rank cases for reuse and revision.

Read 6 tweets

elvis

@omarsar0

Aug 20

Chain-of-Agents

Interesting idea to train a single model with the capabilities of a multi-agent system.

84.6% reduction in inference cost!

Distillation and Agentic RL are no joke!

Here are my notes:

Overview

This work proposes training single models to natively behave like multi‑agent systems, coordinating “role‑playing” and tool agents end‑to‑end.

They distill strong multi‑agent frameworks into CoA trajectories, then optimize with agentic RL on verifiable tasks.

Paradigm shift

CoA generalizes ReAct/TIR by dynamically activating multiple roles and tools within one model, preserving a single coherent state while cutting inter‑agent chatter.

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

elvis

Try unrolling a thread yourself!

More from @omarsar0

elvis

elvis

elvis

elvis

elvis

elvis

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!