Ryan Hart Profile picture
Mar 2 10 tweets 4 min read Read on X
🚨 This might be the blueprint for true general intelligence 😳

A new paper titled “Real Deep Research for AI, Robotics, and Beyond” redefines what “understanding” means for machines.

Instead of shallow pattern matching, it introduces a framework where AI builds internal research hypotheses testing, refining, and reusing them across reasoning, robotics, and multimodal tasks.

The results are insane:

→ Outperforms GPT-4 and Gemini 2.5 on 40+ reasoning benchmarks
→ 3× faster at real-world robotics decision loops
→ Capable of multi-domain self-improvement without fine-tuning

This isn’t another incremental model it’s AI that actually learns how to do research across digital and physical environments.

If this scales, we’re looking at the blueprint for general intelligence not just in code, but in motion.Image
The Deep Research Loop:

The paper starts with this core diagram: a 4-stage research loop (Observe → Hypothesize → Experiment → Revise).

Unlike classic LLMs that just predict text, this system iterates like a scientist.

Every loop improves reasoning and robot control accuracy by up to 27%.Image
This blew my mind 🤯

The model literally builds graphs of hypotheses nodes for ideas, edges for experiments.

You can see clusters forming around new insights just like a human researcher refining a theory.

That’s not prompting that’s cognition. Image
They tested the system on 18 robotic tasks (grasping, assembly, navigation).

Performance jumped from 61.3% → 89.7% success rates after 20 research iterations.

No retraining. Just reasoning.

Robots that learn how to learn. Image
Here’s where it gets wild:

The same model fine-tuned on scientific reasoning transferred to robotics without new data.

Across 7 tasks, it retained 82% of reasoning accuracy a first in this field. Deep research = reusable intelligence. Image
Everyone’s chasing scale, but this model scales intelligently.

While GPT-4 burns compute linearly, this one’s compute cost flattens after a few loops.

Efficiency improves by 3.4× per iteration as reasoning stabilizes.

Self-optimization is the new scaling law. Image
Each square here shows how long the model “remembers” successful hypotheses.

Retention stabilizes at ~74% after 10 research cycles.

That’s memory through reflection, not parameter updates.

It’s how the system learns what’s worth keeping. Image
They even connected multiple “deep researchers” together.

Each agent worked on a subproblem and merged insights.

Result: +22% faster convergence on shared reasoning benchmarks. It’s literally a scientific community made of AIs.

This one’s unreal.

The model autonomously designed a new robotics experiment never seen in training and executed it in simulation with 92% success.

That’s not “following instructions.”

That’s doing science.
The paper ends with a big-picture figure a roadmap showing how this approach connects language, robotics, and symbolic reasoning into one unified framework.

It’s literally titled “The Path to Real Deep Research.”

If they’re right, this is the bridge to AGI.

realdeepresearch.github.io
I hope this was helpful to you.

I post AI tools, AI industry news, and AI business related content.

If you're interested in such posts:

1. Follow me at @thisdudelikesai
2. Repost the post to help others

Thanks for checking...

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ryan Hart

Ryan Hart Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @thisdudelikesAI

May 29
Claude Opus 4.8 can now think like a McKinsey consultant.

Give it a messy business problem and it can build:

- Market map
- Competitor analysis
- Growth strategy
- Risk report
- Execution plan

Here are 10 prompts to activate it 👇 Image
Most people dump a vague question and get a vague answer.

Real consultants structure the problem first.

Start here:

Act as a McKinsey engagement manager. My problem is: [describe it in plain words]. Break it into a MECE issue tree. Give me the 3-5 core questions we need to answer and the sub-questions under each. Don't solve anything yet, just structure it.
Now map the battlefield before you pick a fight.

Build a market map for [my industry]. Cover: total market size and growth rate, the main segments, who controls each one, where the money actually flows, and which segments are growing vs shrinking. Flag every assumption you make.
Read 12 tweets
Apr 29
I use ChatGPT for research without drowning in fake citations.

Most people ask AI to “research this” and trust the answer.

That’s how you get hallucinated studies, broken links, and confident nonsense.

If you want ChatGPT to research like an actual analyst, use these 10 verification tricks:
1. Never ask “research this”

That prompt is too vague.

ChatGPT will summarize what sounds right instead of proving what’s real.

Use this instead:

“Research this topic using only verifiable sources. For every claim, include the source, date, link, and confidence level.”
2. Force it to separate facts from interpretation

Most AI research feels convincing because it mixes evidence and opinion in the same paragraph.

Use this:

“Separate your answer into:
• Verified facts
• Reasonable interpretations
• Speculation
• Unknowns”

This instantly exposes weak claims.
Read 13 tweets
Apr 25
Claude has a secret mode called "Devil's Advocate."

You give it any decision you're about to make and it destroys every assumption holding it together.

Here's how to activate it using 6 prompts (save this) Image
1. The Core Prompt

Open Claude and paste:

“Act as my Devil’s Advocate.

I’m going to describe a decision I’m about to make.

Your job is not to agree with me.

Your job is to attack the logic, assumptions, incentives, risks, blind spots, second-order effects, and emotional biases behind it.

Be direct. Be specific. Be uncomfortable.”

That’s the switch.

Now give it the decision.
2. The Assumption Breaker

Most bad decisions don’t fail because the idea was bad.

They fail because one hidden assumption was wrong.

Paste:

“List every assumption this decision depends on. Which assumptions are strongest? Which are weakest? Which one, if false, would destroy the entire decision?”

This is where confidence starts cracking.

In a good way.
Read 8 tweets
Apr 23
Harvard students have a NotebookLM workflow that replaces 6 hours of revision.

They don’t re-read notes.

They upload lectures, slides, and readings.

NotebookLM builds custom quizzes, predicts likely questions, and explains only weak areas.

It compresses weeks into one session.

Here’s how they do it:Image
1. The Whole Course Compressor

Most students revise chapter by chapter.

That’s slow.

And it hides how ideas connect.

Paste this first:

“Analyze all uploaded materials and compress this course into the 20% of concepts that drive 80% of exam performance. Show how topics connect and which ideas are foundational.”

This changes everything.

Because once you see the backbone of a course, revision gets cleaner instantly.
2. The Likely Exam Topics Prompt

Not every page matters equally.

Professors signal priorities constantly.

Paste:

“Based on lecture emphasis, repeated themes, assignments, readings, and historical patterns, what topics are most likely to be tested heavily?”

This helps you allocate effort properly.

Most students waste hours on low-probability content.
Read 8 tweets
Apr 20
An ex-Anthropic researcher just leaked the internal prompting framework they use on Claude.

Most people leave 60-70% of its reasoning on the table.

No guessing. No prompt engineering courses. No fluff.

10 copy-paste prompts. Tested internally.

Here's the full framework: 👇 Image
1. Role Anchoring

"You are a [specific expert, e.g. senior M&A attorney] with 15+ years of direct experience in [domain].

Before you answer my question, do the following:
1. State the 3 assumptions you're making about my situation
2. List the 3 biggest risks or blind spots in how I've framed the question
3. Ask me up to 2 clarifying questions if critical information is missing

Only after that, give me your answer.

Here's my question: [YOUR QUESTION]"

Forces Claude to surface what you don't know you don't know.
2. Chain of Verification

"Answer my question below. Then run this verification loop:

1. List every factual claim in your answer
2. For each claim, rate your confidence (high / medium / low) and explain why
3. Identify the 3 claims most likely to be wrong
4. Revise your answer to remove or caveat those claims
5. Give me the final revised answer

Question: [YOUR QUESTION]"

Cuts hallucination on factual tasks dramatically. Works on any model.
Read 12 tweets
Apr 15
Someone just turned Claude Code into a full video production studio.

It's called OpenMontage and it's the world's first open-source, agentic video production system.

11 pipelines. 49 tools. 400+ agent skills and it costs $0.69 to produce a complete cinematic product ad.

Here's what this thing actually does: 🧵
Here's the wildest part:

You don't even need API keys to start.

Out of the box, you get:

→ Free offline text-to-speech via Piper TTS
→ Free stock footage from Pexels + Pixabay
→ Remotion turns still images into animated video with spring physics and transitions
→ FFmpeg handles encoding, audio mixing, and subtitle burn-in

Real videos. Zero cost.
Add one API key (FAL) and it unlocks:

→ FLUX AI image generation
→ Google Veo 3 video generation
→ Kling, MiniMax, Runway Gen-4 video clips
→ Recraft images

The VOID product ad they demoed 4 AI images, TTS narration, royalty-free music, word-level subtitles cost exactly $0.69.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(