There is a huge interest in moving from hand-crafted agentic systems to lifelong, adaptive agentic ecosystems.
What's the progress, and where are things headed?
Let's find out:
This survey defines self-evolving AI agents and argues for a shift from static, hand-crafted systems to lifelong, adaptive agentic ecosystems.
It maps the field’s trajectory, proposes “Three Laws” to keep evolution safe and useful, and organizes techniques across single-agent, multi-agent, and domain-specific settings.
Paradigm shift and guardrails
The paper frames four stages: Model Offline Pretraining → Model Online Adaptation → Multi-Agent Orchestration → Multi-Agent Self-Evolving.
It introduces three guiding laws for evolution: maintain safety, preserve or improve performance, and then autonomously optimize.
LLM-centric learning paradigms:
MOP (Model Offline Pretraining): Static pretraining on large corpora; no adaptation after deployment.
MOA (Model Online Adaptation): Post-deployment updates via fine-tuning, adapters, or RLHF.
MAO (Multi-Agent Orchestration): Multiple agents coordinate through message exchange or debate, without changing model weights.
MASE (Multi-Agent Self-Evolving): Agents interact with their environment, continually optimising prompts, memory, tools, and workflows.
The Evolution Landscape of AI Agents
The paper presents a visual taxonomy of AI agent evolution and optimisation techniques, categorised into three major directions:
single-agent optimisation, multi-agent optimisation, and domain-specific optimisation.
Unified framework for evolution
A single iterative loop connects System Inputs, Agent System, Environment feedback, and Optimizer.
Optimizers search over prompts, tools, memory, model parameters, and even agent topologies using heuristics, search, or learning.
Single-agent optimization toolbox
Techniques are grouped into:
(i) LLM behavior (training for reasoning; test-time scaling with search and verification),
(ii) prompt optimization (edit, generate, text-gradient, evolutionary),
(iii) memory optimization (short-term compression and retrieval; long-term RAG, graphs, and control policies), and
(iv) tool use and tool creation.
Agentic Self-Evolution methods
The authors present a comprehensive hierarchical categorization of agentic self-evolution methods, including single-agent, multi-agent, and domain-specific optimization categories.
Multi-agent workflows that self-improve
Beyond manual pipelines, the survey treats prompts, topologies, and backbones as searchable spaces.
It distinguishes code-level workflows and communication-graph topologies, covers unified optimization that jointly tunes prompts and structure, and describes backbone training for better cooperation.
Evaluation, safety, and open problems
Benchmarks span tools, web navigation, GUI agents, collaboration, and specialized domains; LLM-as-judge and Agent-as-judge reduce evaluation cost while tracking process quality.
The paper stresses continuous, evolution-aware safety monitoring and highlights challenges such as stable reward modeling, efficiency-effectiveness trade-offs, and transfer of optimized prompts/topologies to new models or domains.
The spec-init slash command prompt, if you want to try it:
"Your task is to first help me build a spec for my new project in ARGUMENT.
Use the AskUserQuestion Tool to help build the spec in ARGUMENT by interviewing me and gathering requirements and details about the project implementation, UI & UX, tech stack, concerns, tradeoffs, etc.
Make sure questions are not obvious and probe deeper into the underlying needs and constraints.
Interview me continually and systematically until the spec is complete. Document all responses and insights to create a comprehensive and well-structured specification that serves as the foundation for the project."
Just built a new skill in Claude Code using Opus 4.5.
The skill uses Gemini 3 Pro (via API) for designing web pages.
Look at what it generated from one simple prompt.
If you have been designing websites with Claude Code, you already know how generic they turn out.
So I built a skill that uses Gemini 3 Pro to lead creative direction and generate designs. It is extremely good at this.
Opus 4.5 then integrates all that into our app.
The prompt I used: "I want to design the landing page for a new AI game. We want it to be futuristic and all that, and use animations as much as possible."
I will test with some other prompts and see how far I can push this. But the results are very exciting already.
This is one of the most insane things Nano Banana Pro 🍌 can do.
It can reproduce figures with mind-blowing precision.
No competition in this regard!
Prompt: "Please reproduce this chart in high quality and fidelity and offer annotated labels to better understand it."
When I tried this for the first time, I didn't expect that this was possible.
The level of understanding this requires is what's remarkable about it all.
The levels of personalization this unlocks are also impressive.
"Can you convert it into a cartoonish version?"
Just look at this 🤯
"Can you create a delightful cartoonish version of this table. And please put cute colors and icons along with interesting annotations to make it more readable."