ℏεsam Profile picture
Apr 5, 2025 11 tweets 4 min read Read on X
the best researchers from Meta, Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book],

here are some of their key findings: Image
they build a mapping of different agent components, such as perception, memory, and world modelling, to different regions of the human brain and compare them:

- brain is much more energy-efficient
- no genuine experience in agents
- brain learns continuously, agent is static Image
an agent is broken down to:
- Perception: the agent's input mechanism. can be improved with multi-modality, feedback mechanisms (e.g., human corrections), etc.
- Cognition: learning, reasoning, planning, memory. LLMs are key in this part.
- Action: agent's output and tool use. Image
agentic memory is represented as:
- Sensory memory or short-term holding of inputs which is not emphasized much in agents.
- Short-term memory which is the LLM context window
- Long-term memory which is the external storage such as RAG or knowledge graphs. Image
the memory in agents can be improved and researched in terms of:
- increasing the amount of stored information
- how to retrieve the most relevant info
- combining context-window memory with external memory
- deciding what to forget or update in memory Image
the agent must simulate or predict the future states of the environment for planning and decision-making.

ai world models are much simpler than the humans' with their causal reasoning (cause-and-effect) or physical intuition.

LLM world models are mostly implicit and embedded Image
EMOTIONS are a deep aspect of humans, helping them with social interactions, decision-making, or learning.

agents must understand emotions to better interact with us.

but rather than encoding the feeling of emotions, they have a surface-level modelling of emotions. Image
Perception is the process by which an agent receives and interprets raw data from its surroundings.

human perception is complex, while AI's perception is mostly limited to textual and vision data, though research is finding ways to incorporate more (e.g. audio) Image
the paper goes on to explore multi-agent systems and the approach of key players such as MetaGPT, @CamelAIOrg , @huggingface, or ChatDEV.

It also touches on online active learning, design of multi-agent systems, and different agent collaboration paradigms. Image
I only covered the Part I of the paper. It has 4 comprehensive parts which cover almost all crucial things to know about agents.

Read Paper: huggingface.co/papers/2504.01…
CORRECTION: the paper is not affiliated with META but @MetaGPT_

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with ℏεsam

ℏεsam Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Hesamation

Sep 12, 2025
Anthropic just dropped a full masterclass on building tools for your agents, here's the gist:
> evaluate your tools religiously
> limit the number of tools
> namespace your tools
> return meaningful context from tools
> prompt-engineer your tool descriptions
what each means: Image
1. evaluate your tools
use agents to create a test set of real-world tasks. then evaluate your tool on this benchmark. refine your tool description and args. create a hold-out test set and evaluate on that too. measure your tool performance and make sure it works. Image
2. don't overflow the agent with tools
more tools don't lead to better outcomes. they fill the precious context very fast. build few super-optimized tools for high-impact workflows. Image
Read 7 tweets
May 23, 2025
large language model explained through 4 simple notes:

1. a little history and traditional methods. Image
2. vector embeddings and RNNs. Image
3. attention and the encoder-decoder architecture. Image
Read 5 tweets
Jan 29, 2025
🧵SFT memorizes and RL generalizes,
based on OpenAI o1 and DeepSeek R1 we know that RL helps the models with reasoning, but this paper (dropped today) explores:
> how does SFT or RL affect the model’s generalization to different rules?
> Is SFT necessary for RL training? Image
In short, the paper argues that supervised fine-tuning (SFT) helps the model memorize and align with certain outputs, while reinforcement learning (RL) helps the model generalize and learn out-of-distribution (OOD) tasks. Image
the experiment is done in both textual and visual environments:
the textual task includes presenting the model with numbers and prompting it to produce an equation that equals a target number. the 'J', 'Q', and 'K' cards are given different values as a variation of rules. Image
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(