elvis Profile picture
Aug 31 10 tweets 4 min read Read on X
Overview of Self-Evolving Agents

There is a huge interest in moving from hand-crafted agentic systems to lifelong, adaptive agentic ecosystems.

What's the progress, and where are things headed?

Let's find out: Image
This survey defines self-evolving AI agents and argues for a shift from static, hand-crafted systems to lifelong, adaptive agentic ecosystems.

It maps the field’s trajectory, proposes “Three Laws” to keep evolution safe and useful, and organizes techniques across single-agent, multi-agent, and domain-specific settings.
Paradigm shift and guardrails

The paper frames four stages: Model Offline Pretraining → Model Online Adaptation → Multi-Agent Orchestration → Multi-Agent Self-Evolving.

It introduces three guiding laws for evolution: maintain safety, preserve or improve performance, and then autonomously optimize.Image
LLM-centric learning paradigms:

MOP (Model Offline Pretraining): Static pretraining on large corpora; no adaptation after deployment.

MOA (Model Online Adaptation): Post-deployment updates via fine-tuning, adapters, or RLHF.

MAO (Multi-Agent Orchestration): Multiple agents coordinate through message exchange or debate, without changing model weights.

MASE (Multi-Agent Self-Evolving): Agents interact with their environment, continually optimising prompts, memory, tools, and workflows.Image
The Evolution Landscape of AI Agents

The paper presents a visual taxonomy of AI agent evolution and optimisation techniques, categorised into three major directions:
single-agent optimisation, multi-agent optimisation, and domain-specific optimisation. Image
Unified framework for evolution

A single iterative loop connects System Inputs, Agent System, Environment feedback, and Optimizer.

Optimizers search over prompts, tools, memory, model parameters, and even agent topologies using heuristics, search, or learning. Image
Single-agent optimization toolbox

Techniques are grouped into:

(i) LLM behavior (training for reasoning; test-time scaling with search and verification),

(ii) prompt optimization (edit, generate, text-gradient, evolutionary),

(iii) memory optimization (short-term compression and retrieval; long-term RAG, graphs, and control policies), and

(iv) tool use and tool creation.Image
Agentic Self-Evolution methods

The authors present a comprehensive hierarchical categorization of agentic self-evolution methods, including single-agent, multi-agent, and domain-specific optimization categories. Image
Multi-agent workflows that self-improve

Beyond manual pipelines, the survey treats prompts, topologies, and backbones as searchable spaces.

It distinguishes code-level workflows and communication-graph topologies, covers unified optimization that jointly tunes prompts and structure, and describes backbone training for better cooperation.Image
Evaluation, safety, and open problems

Benchmarks span tools, web navigation, GUI agents, collaboration, and specialized domains; LLM-as-judge and Agent-as-judge reduce evaluation cost while tracking process quality.

The paper stresses continuous, evolution-aware safety monitoring and highlights challenges such as stable reward modeling, efficiency-effectiveness trade-offs, and transfer of optimized prompts/topologies to new models or domains.

Paper: arxiv.org/abs/2508.07407Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with elvis

elvis Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @omarsar0

Aug 28
Memory-R1

Another really cool paper showing how RL can enhance an LLM's agentic and memory capabilities.

Great read for AI devs.

Here are my notes: Image
Overview

A framework that teaches LLM agents to decide what to remember and how to use it.

Two RL-fine-tuned components work together: a Memory Manager that learns CRUD-style operations on an external store and an Answer Agent that filters retrieved memories via “memory distillation” before answering.Image
Active memory control with RL

The Memory Manager selects ADD, UPDATE, DELETE, or NOOP after a RAG step and edits entries accordingly; training with PPO or GRPO uses downstream QA correctness as the reward, removing the need for per-edit labels. Image
Read 7 tweets
Aug 27
Don't sleep on small models!

Anemoi is the latest multi-agent system that proves small models pack a punch when combined effectively.

GPT-4.1-mini (for planning) and GPT-4o (for worker agents) surpass the strongest open-source baseline on GAIA.

A must-read for devs: Image
Quick Overview

Anemoi is a semi-centralized generalist multi-agent system powered by an A2A communication MCP server from @Coral_Protocol.

Anemoi replaces purely centralized, context-stuffed coordination with an A2A communication server (MCP) that lets agents talk directly, monitor progress, refine plans, and reach consensus.Image
Design

A semi-centralized planner proposes an initial plan, while worker agents (web, document processing, reasoning/coding) plus critique and answer-finding agents collaborate via MCP threads.

Agents communicate directly with each other.

All participants can list agents, create threads, send messages, wait for mentions, and update plans as execution unfolds.Image
Read 8 tweets
Aug 27
Efficient Language Model with PostNAS

NVIDIA's recent research on LLMs has been fantastic.

Jet-Nemotron is the latest in efficient language models, which significantly improves generation throughput.

Here are my notes: Image
A hybrid-architecture LM family built by “adapting after pretraining.”

Starting from a frozen full-attention model, the authors search where to keep full attention, which linear-attention block to use, and which hyperparameters match hardware limits.

The result, Jet-Nemotron-2B/4B, matches or surpasses popular full-attention baselines while massively increasing throughput on long contexts.Image
PostNAS pipeline

Begins with a pre-trained full-attention model and freezes MLPs, then proceeds in four steps:

1. Learn optimal placement or removal of full-attention layers
2. Select a linear-attention block
3. Design a new attention block
4. Run a hardware-aware hyperparameter searchImage
Read 7 tweets
Aug 25
Fine-tuning LLM Agents without Fine-tuning LLMs

Catchy title and very cool memory technique to improve deep research agents.

Great for continuous, real-time learning without gradient updates.

Here are my notes: Image
Overview

Proposes a memory‑based learning framework that lets deep‑research agents adapt online without updating model weights.

The agent is cast as a memory‑augmented MDP with case‑based reasoning, implemented in a planner–executor loop over MCP tools. Image
Method

Decisions are guided by a learned case‑retrieval policy over an episodic Case Bank.

Non‑parametric memory retrieves Top‑K similar cases; parametric memory learns a Q‑function (soft Q‑learning or single‑step CE training in deep‑research settings) to rank cases for reuse and revision.
Read 6 tweets
Aug 20
Chain-of-Agents

Interesting idea to train a single model with the capabilities of a multi-agent system.

84.6% reduction in inference cost!

Distillation and Agentic RL are no joke!

Here are my notes: Image
Overview

This work proposes training single models to natively behave like multi‑agent systems, coordinating “role‑playing” and tool agents end‑to‑end.

They distill strong multi‑agent frameworks into CoA trajectories, then optimize with agentic RL on verifiable tasks. Image
Paradigm shift

CoA generalizes ReAct/TIR by dynamically activating multiple roles and tools within one model, preserving a single coherent state while cutting inter‑agent chatter. Image
Read 9 tweets
Aug 19
Has GPT-5 Achieved Spatial Intelligence?

GPT-5 sets SoTA but not human‑level spatial intelligence.

My notes below: Image
This report introduces a unified view of spatial intelligence (SI) for multimodal models and evaluates GPT‑5 and strong baselines across eight fresh SI benchmarks.

GPT‑5 leads overall but is still short of human skill, especially on mentally reconstructing shapes, changing viewpoints, and deformation/assembly tasks.Image
Unified SI schema and fair eval setup

The authors consolidate prior work into six core SI capabilities (Metric Measurement, Mental Reconstruction, Spatial Relations, Perspective‑taking, Deformation & Assembly, Comprehensive Reasoning) and standardize prompts, answer extraction, and metrics to reduce evaluation variance across datasets.Image
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(