the best researchers from Meta, Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],
here are some of their key findings:
they build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:
- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static
an agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use.
agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs.
the memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory
the agent must simulate or predict the future states of the environment for planning and decision-making.
ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.
LLM world models are mostly implicit and embedded
EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.
agents must understand emotions to better interact with us.
but rather than encoding the feeling of emotions, they have a surface-level modelling of emotions.
Perception is the process by which an agent receives and interprets raw data from its surroundings.
human perception is complex, while AI's perception is mostly limited to textual and vision data, though research is finding ways to incorporate more (e.g. audio)
the paper goes on to explore multi-agent systems and the approach of key players such as MetaGPT, @CamelAIOrg , @huggingface, or ChatDEV.
It also touches on online active learning, design of multi-agent systems, and different agent collaboration paradigms.
I only covered the Part I of the paper. It has 4 comprehensive parts which cover almost all crucial things to know about agents.
Anthropic just dropped a full masterclass on building tools for your agents, here's the gist:
> evaluate your tools religiously
> limit the number of tools
> namespace your tools
> return meaningful context from tools
> prompt-engineer your tool descriptions
what each means:
1. evaluate your tools
use agents to create a test set of real-world tasks. then evaluate your tool on this benchmark. refine your tool description and args. create a hold-out test set and evaluate on that too. measure your tool performance and make sure it works.
2. don't overflow the agent with tools
more tools don't lead to better outcomes. they fill the precious context very fast. build few super-optimized tools for high-impact workflows.
🧵SFT memorizes and RL generalizes,
based on OpenAI o1 and DeepSeek R1 we know that RL helps the models with reasoning, but this paper (dropped today) explores:
> how does SFT or RL affect the model’s generalization to different rules?
> Is SFT necessary for RL training?
In short, the paper argues that supervised fine-tuning (SFT) helps the model memorize and align with certain outputs, while reinforcement learning (RL) helps the model generalize and learn out-of-distribution (OOD) tasks.
the experiment is done in both textual and visual environments:
the textual task includes presenting the model with numbers and prompting it to produce an equation that equals a target number. the 'J', 'Q', and 'K' cards are given different values as a variation of rules.