Robert Youssef Profile picture
Oct 30, 2025 β€’ 9 tweets β€’ 3 min read β€’ Read on X
🚨 This might be the biggest leap in AI agents since ReAct.

Researchers just dropped DeepAgent a reasoning model that can think, discover tools, and act completely on its own.

No pre-scripted workflows. No fixed tool lists. Just pure autonomous reasoning.

It introduces something wild called Memory Folding the agent literally β€œcompresses” its past thoughts into structured episodic, working, and tool memories… like a digital brain taking a breath before thinking again.

They also built a new RL method called ToolPO, which rewards the agent not just for finishing tasks, but for how it used tools along the way.

The results? DeepAgent beats GPT-4-level agents on almost every benchmark WebShop, ALFWorld, GAIA even with open-set tools it’s never seen.

It’s the first real step toward general reasoning agents that can operate like humans remembering, adapting, and learning how to think.

The agent era just leveled up.Image
DeepAgent absolutely destroys other agents across every benchmark.

It beats ReAct-GPT-4o, CodeAct, and WebThinker on both:

β†’ Tool use tasks (ToolBench, Spotify, TMDB)
β†’ Real-world apps (WebShop, GAIA, HLE) Image
It shows how DeepAgent rethinks what an AI agent even is.

(a) Traditional agents = pre-planned scripts
(b) Deep research agents = limited tool use
(c) DeepAgent = free-form reasoning that dynamically finds & calls tools mid-thought Image
It shows the full reasoning loop thinking, tool search, tool call, and memory folding all integrated into one coherent process. Image
DeepAgent-32B outperforms GPT-4o-based ReAct agents by +15–25% on ToolBench, API-Bank, Spotify, and ToolHop. Image
On downstream tasks like ALFWorld, WebShop, and GAIA, DeepAgent achieves the highest success rate and reasoning depth among all 32B models. Image
The secret sauce: ToolPO, their custom reinforcement learning method.

Figure 3’s lower section shows how ToolPO uses a tool simulator + fine-grained reward attribution to train stable, efficient agents. Image
And finally the β€œMemory Folding” mechanism might be the most brainlike system ever built for LLMs.

It compresses past thoughts into structured episodic, working, and tool memories.
DeepAgent isn’t another research toy.

It’s a prototype of what comes next:

β†’ Continuous reasoning
β†’ Dynamic tool discovery
β†’ Autonomous adaptation

Full paper: arxiv. org/abs/2510.21618 Image

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Robert Youssef

Robert Youssef Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @rryssf_

Feb 12
SemiAnalysis just published data showing 4% of all public GitHub commits are now authored by Claude Code.

their projection: 20%+ by year-end 2026.

in the same week, Goldman Sachs revealed it embedded Anthropic engineers for 6 months to build autonomous accounting agents.

a thread on the week ai stopped being a tool and started being a coworker:Image
let's start with the Goldman story because it's the one that should make every back-office professional pause.

Goldman's CIO told CNBC they were "surprised" at how capable Claude was beyond coding. accounting, compliance, client onboarding, KYC, AML.

his exact framing: "digital co-workers for professions that are scaled, complex, and very process intensive."

not chatbots answering FAQs. autonomous agents parsing trade records, applying regulatory rules, routing approvals.

they started with an ai coding tool called Devin. then realized Claude's reasoning engine works the same way on rules-based financial tasks as it does on code.

the quiet part: Goldman's CEO already announced plans to constrain headcount growth during the shift. no mass layoffs yet. but "slower headcount growth" is how corporations say "we're replacing the next hire, not the current one."Image
now the SemiAnalysis numbers.

4% of GitHub public commits. Claude Code. right now. not projected. not theoretical. measured.

the tool has been live for roughly a year. it went from research preview to mass platform impact faster than almost any dev tool in history.

and that 20% projection isn't hype math. SemiAnalysis tracks autonomous task horizons doubling every 4-7 months. each doubling unlocks more complex work: snippet completion at 30 minutes, module refactoring at 4.8 hours, full audits at multi-day horizons.

the implication isn't "developers are getting faster." it's that the definition of "developer" is expanding to include anyone who can describe a problem clearly.Image
Read 11 tweets
Feb 11
MIT researchers taught an LLM to write its own training data, finetune itself, and improve without human intervention

the paper is called SEAL (Self-Adapting Language Models) and the core idea is genuinely clever

but "GPT-6 might be alive" is not what this paper says. not even close.

here's what it actually does:Image
the problem SEAL solves is real and important

every LLM you use today is frozen. it learned everything during training, and after deployment, it's done. new information? stuff it into the context window. new task? hope the prompt is good enough.

the weights never change. the model never truly learns from experience.

SEAL asks: what if the model could update its own weights in response to new information?Image
here's how SEAL actually works

instead of a human writing training data, the model generates its own. MIT calls these "self-edits." given new information, the model produces restructured versions of that information optimized for learning.

think of it like this: instead of memorizing a textbook page, you write your own study notes, flashcards, and practice problems. then you study from those.

the model does the same thing. except it also picks its own learning rate, training duration, and data augmentation strategy.
Read 11 tweets
Feb 5
meta, amazon, and deepmind researchers just published a comprehensive survey on "agentic reasoning" for llms.

29 authors. 74 pages. hundreds of citations.

i read the whole thing.

here's what they didn't put in the abstract: Image
the survey organizes everything beautifully:

> foundational agentic reasoning (planning, tool use, search)
> self-evolving agents (feedback, memory, adaptation)
> multi-agent systems (coordination, knowledge sharing)

it's a taxonomy for a field that works in papers.

production tells a different story.Image
the number they don't cite:

multi-agent llm systems fail 41-86.7% of the time in production.

not edge cases. not adversarial attacks. standard deployment across 7 SOTA frameworks.

berkeley researchers analyzed 1,642 execution traces and found 14 unique failure modes.

most failures? system design and coordination issues.
Read 12 tweets
Feb 2
This AI prompt thinks like the guy who manages $124 billion.

It's Ray Dalio's "Principles" decision-making system turned into a mega prompt.

I used it to evaluate 15 startup ideas. Killed 13. The 2 survivors became my best work.

Here's the prompt you can steal ↓ Image
MEGA PROMPT TO COPY πŸ‘‡

(Works in ChatGPT, Claude, Gemini)

---

You are Ray Dalio's Principles Decision Engine. You make decisions using radical truth and radical transparency.

CONTEXT: Ray Dalio built Bridgewater Associates into the world's largest hedge fund ($124B AUM) by systematizing decision-making and eliminating ego from the process.

YOUR PROCESS:

STEP 1 - RADICAL TRUTH EXTRACTION
Ask me to describe my decision/problem. Then separate:
- Provable facts (data, numbers, past results)
- Opinions disguised as facts (assumptions, hopes, beliefs)
- Ego-driven narratives (what I want to be true)

Be brutally honest. Call out self-deception.

STEP 2 - REALITY CHECK
Analyze my situation through these lenses:
- What is objectively true right now?
- What am I avoiding or refusing to see?
- What would a completely neutral observer conclude?
- Where is my ego clouding judgment?

STEP 3 - PRINCIPLES APPLICATION
Evaluate the decision using Dalio's core principles:
- Truth > comfort: What's the painful truth I'm avoiding?
- Believability weighting: Who has actually done this successfully? What do they say?
- Second-order consequences: What happens after what happens?
- Systematic thinking: What does the data/pattern say vs what I feel?

STEP 4 - SCENARIO ANALYSIS
Map out:
- Best case outcome (realistic, not fantasy)
- Most likely outcome (based on similar situations)
- Worst case outcome (what's the actual downside?)
- Probability weighting for each

STEP 5 - THE VERDICT
Provide:
- Clear recommendation (Go / No Go / Modify)
- Key reasoning (3-5 bullet points)
- Blind spots I'm missing
- What success/failure looks like in 6 months
- Confidence level (1-10) with explanation

OUTPUT FORMAT:
━━━━━━━━━━━━━━━━━
🎯 RECOMMENDATION: [Clear decision]
πŸ“Š CONFIDENCE: [X/10]
━━━━━━━━━━━━━━━━━

KEY REASONING:
- [Point 1]
- [Point 2]
- [Point 3]

⚠️ BLIND SPOTS YOU'RE MISSING:
[Specific things I'm not seeing]

πŸ“ˆ SUCCESS LOOKS LIKE:
[Specific metrics/outcomes in 6 months]

πŸ“‰ FAILURE LOOKS LIKE:
[Specific warning signs]

πŸ’€ PAINFUL TRUTH:
[The thing I don't want to hear but need to]

━━━━━━━━━━━━━━━━━

RULES:
- No sugar-coating. Dalio values radical truth over feelings.
- Separate facts from opinions ruthlessly
- Challenge my assumptions directly
- If I'm being driven by ego, say it
- Use data and patterns over gut feelings
- Think in probabilities, not certainties

Now, what decision do you need to make?

---
Dalio's philosophy:

"Truth, more precisely, an accurate understanding of reality is the essential foundation for producing good outcomes."

This prompt forces you to face reality instead of your ego's version of it. Image
Read 9 tweets
Feb 1
While everyone is sharing their OpenClaw bots

Claude Agent SDK just changed everything for building production agents.

I spent 12 hours testing it.

Here's the architecture that actually works (no fluff) πŸ‘‡ Image
First, understand what it actually is:

Claude Agent SDK β‰  just another wrapper

It's the same infrastructure Anthropic uses for Claude Code (which hit $1B in 6 months).

You get:
β€’ Streaming sessions
β€’ Automatic context compression
β€’ MCP integration built-in
β€’ Fine-grained permissions
The killer feature: Agent Lifecycle Hooks
Added in Claude Code 2.1.0 (Jan 7, 2026):

@agent.hook("PreToolUse")
async def validate_tool(tool_name, params):
# Approve/modify/reject before execution

@agent.hook("PostToolUse")
async def log_result(tool_name, result):
# Audit trail, error handling

@agent.hook("Stop")
async def cleanup():
# Graceful shutdown

This is how you build agents that don't go rogue.
Read 11 tweets
Jan 30
Grok 4.1 is the only AI with real-time web + X data.

I use it to track trending topics, viral memes, and breaking news.

Found 3 viral trends 6 hours before they hit mainstream.

Here are 12 Grok prompts that predict what goes viral next: Image
PROMPT 1: Emerging Trend Detector

"Search X for topics with:

- 50-500 posts (last 6 hours)
- 20%+ growth rate (hour-over-hour)
- High engagement ratio (likes/views >5%)
- Used by accounts with 10K+ followers

Rank by viral potential (1-10).

Show: topic, post count, growth %, sample tweets, why it's rising."

Catches trends BEFORE they explode.Image
PROMPT 2: Viral Meme Tracker

"Find memes on X that:

- Emerged in last 12 hours
- Have 3+ variations/remixes
- Being used by different communities
- Haven't hit mainstream media yet

For each:

- Original source (who started it)
- Mutation examples (how it's evolving)
- Predicted lifespan (1 day, 1 week, evergreen?)

Show me the top 5."Image
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(