Millie Marconi Profile picture
Oct 9 8 tweets 3 min read Read on X
Holy shit...Stanford just built a system that converts research papers into working AI agents.

It’s called Paper2Agent, and it literally:

• Recreates the method in the paper
• Applies it to your own dataset
• Answers questions like the author

This changes how we do science forever.

Let me explain ↓Image
The problem is obvious to anyone who’s ever read a “methods” paper:

You find the code. It breaks.
You try the tutorial. Missing dependencies.
You email the authors. Silence.

Science moves fast, but reproducibility is a joke.

Paper2Agent fixes that. It automates the whole conversion paper → runnable AI agent.
Here’s how it works (and this part is wild):

It reads the paper, grabs the GitHub repo, builds the environment, figures out the methods, then wraps everything as an MCP server.

That’s a protocol any LLM (Claude, GPT, Gemini) can talk to.
So you just ask:

“Run the Scanpy pipeline on my data.h5ad”

and it actually runs it.Image
They tested it on three big biology papers:

• AlphaGenome - predicts genetic variant effects
• TISSUE - uncertainty-aware spatial transcriptomics
• Scanpy - single-cell clustering

All converted automatically.
All reproduced results exactly.

Zero human setup. Image
And this is where it gets interesting.

The AlphaGenome agent disagreed with the original authors.

When asked to re-analyze a variant linked to cholesterol, it picked a different causal gene (SORT1) and defended it with plots, quantile scores, and biological reasoning.

An AI agent just reinterpreted a Nature paper.Image
Think about what that means.

Every paper becomes a living system.
You don’t just read it - you talk to it.
You test it, challenge it, extend it.

And if your paper can’t be turned into an agent?
Maybe it wasn’t reproducible to begin with.
PDFs are static.
Agents are alive.

Paper2Agent hints at a future where discoveries are interactive.

Where AlphaFold could talk to Scanpy.
Where methods become APIs.

Honestly, this might be what “AI co-scientists” actually looks like. Image
Stop guessing what your customers want.

TestFeed gives you AI personas of your target customers + expert consultants that:

- See your screen while you work
- Give contextual feedback in real-time
- Think like the actual people you're building for

Try it free: testfeed.ai

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Millie Marconi

Millie Marconi Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Yesterday_work_

Oct 14
Meta just did the unthinkable.

They figured out how to train AI agents without rewards, human demos, or supervision and it actually works better than both.

It’s called 'Early Experience', and it quietly kills the two biggest pain points in agent training:

→ Human demonstrations that don’t scale
→ Reinforcement learning that’s expensive and unstable

Instead of copying experts or chasing reward signals, agents now:

- Take their own actions
- Observe what happens
- Learn directly from consequences — *no external rewards needed*

The numbers are wild:

✅ +18.4% on web navigation (WebShop)
✅ +15.0% on complex planning (TravelPlanner)
✅ +13.3% on scientific reasoning (ScienceWorld)
✅ Works across **8 environments**

And when you add RL afterward?

🔥 +6.4% better than traditional pipelines.

Two key ideas make it work:

1. Implicit World Modeling - agents predict what happens next, forming an internal world model.

2. Self-Reflection - they compare mistakes to experts and explain why the expert choice was better.

Both scale. Both are reward-free.

Efficiency is absurd:

1/8 of expert data
86.9% lower cost
Works from 3B → 70B models

This isn’t incremental.

It’s the bridge between imitation learning and true autonomous experience.

AI agents can now teach themselves - no human hand-holding required.Image
The problem with current AI agents is brutal.

Imitation Learning: Agents only see expert demos.

When they mess up, they can't recover because they never learned what happens when you take wrong actions.

RL: Needs verifiable rewards. Most real-world environments don't have them.Early Experience solves both.
Here's how Self-Reflection actually works:

1/ Agent sees an expert action at each state
2/ Agent proposes 3 alternative actions
3/ Environment shows what happens with each
4/ LLM generates reasoning: "Why was the expert choice better?"
5/ Agent trains on this reasoning

It's learning from contrast, not just copying.Image
Read 7 tweets
Oct 11
Stanford just pulled off something wild 🤯

They made models smarter without touching a single weight.

The paper’s called Agentic Context Engineering (ACE), and it flips the whole fine-tuning playbook.

Instead of retraining, the model rewrites itself.

It runs a feedback loop write, reflect, edit until its own prompt becomes a living system.

Think of it as giving the LLM memory, but without changing the model.
Just evolving the context.

Results are stupid good:

+10.6% better than GPT-4 agents on AppWorld
+8.6% on finance reasoning
86.9% lower cost and latency

The trick?
Everyone’s been obsessed with clean, minimal prompts.
ACE shows the opposite: long, dense, self-growing prompts win.

Fine-tuning was about changing the model.
ACE is about teaching it to change *itself.*

This isn’t prompt engineering anymore.
It’s prompt evolution.Image
Here’s how ACE works 👇

It splits the model’s brain into 3 roles:

Generator - runs the task
Reflector - critiques what went right or wrong
Curator - updates the context with only what matters

Each loop adds delta updates small context changes that never overwrite old knowledge.

It’s literally the first agent framework that grows its own prompt.Image
Every prior method had one fatal flaw: context collapse.

Models rewrite their entire prompt each time → it gets shorter → details vanish → accuracy tanks.

In the paper, one model’s accuracy fell from 66.7 → 57.1 after a single rewrite.

ACE fixes that by never rewriting the full context - only updating what changed.Image
Read 7 tweets
Oct 6
I just read the most important AI paper of 2025.

A research team achieved what OpenAI couldn't with $100M using just 78 training samples.

The entire industry is about to flip upside down.

Here's everything you need to know: Image
Today, most AI labs follow the same playbook: more data = better agents.

LIMI's researchers say: that's wasteful, unnecessary, and about to change.

Strategic curation beats brute force scaling for agentic intelligence.

They proved it with numbers that will make you rethink everything.

The Agency Efficiency Principle is simple:

Machine autonomy emerges from strategic curation of high-quality demonstrations, not data abundance.

For agentic tasks, quality ≠ quantity.Image
LIMI used only 78 carefully designed training samples focused on collaborative coding and research workflows.

Each sample averaged 42.4k tokens capturing complete human-AI interaction patterns.

Not toy examples. Real collaborative workflows that actually matter. Image
Read 8 tweets
Oct 3
I finally understand why Claude 4.5 Sonnet is dominating right now.

After testing it on real marketing campaigns, app builds, and content creation... it blew my mind.

Here are 5 powerful ways to use the new Claude model to automate the tedious tasks:
1. Marketing Automation

Here’s my marketing automation prompt:

"You are now my AI marketing strategist.

Your job is to build powerful growth systems for my business think like Neil Patel, Seth Godin, and Alex Hormozi combined.

I want you to:

Build full-funnel strategies (top to bottom)

Write ad copy, landing pages, and email sequences

Recommend automation tools, lead magnets, and channel tactics

Prioritize fast ROI, data-driven decisions, and creative thinking

Always ask clarifying questions before answering. Think long-term and execute short-term.

Do marketing like experts do. Ask: “What would Hormozi, Seth, or Neil do?"

Copy the prompt and paste it in Claude new chat.

After that, start asking it questions.
2. Writing Content (Blogs + Social)

My go-to content prompt:

"You are now my AI ghostwriter and content machine.

Write like a mix of Naval Ravikant, Ann Handley, and David Ogilvy.

Your job is to:

Write viral threads, blogs, and newsletters

Break down ideas clearly, with hooks and storytelling

Create repurposable content across Twitter, LinkedIn, and blogs

Always follow this rule: Clarity beats cleverness.

Act like a content genius who asks: “How would Naval tweet this? Would Ogilvy approve this headline?”
Read 7 tweets
Sep 26
Every AI agent demo you've seen is basically fraud.

Google just dropped their internal agent playbook and exposed how broken the entire space is.

That "autonomous AI employee" your startup demoed last week? It's three ChatGPT calls wrapped in marketing copy. Google's real agents need four evaluation layers, full DevOps infrastructure, and security protocols most teams have never heard of.

While founders pitch "agents that think," Google ships AgentOps with Terraform configs and CI/CD pipelines. They're building distributed systems. Everyone else is building expensive chatbots.

The gap is insane. Startups demo function calls. Google deploys sequential workflows, parallel processing, and loop agents with ACID compliance.

Most brutal part: the security requirements. These agents access internal APIs and databases. One prompt injection and your company data is gone. Most builders treat this like an afterthought.

Google's playing chess while everyone else plays checkers. Let startups burn VC money on agent toys, then dominate when they need actual production infrastructure.

The agent revolution isn't happening until people stop confusing demos with systems.Image
The guide reveals Google's three-path strategy for agent development.

Most teams are randomly picking tools without understanding these architectural choices. Image
Here's what nobody talks about: agent evaluation isn't just "does it work?" Google breaks it into 4 layers that expose how shallow most current agents really are.
Read 8 tweets
Sep 19
This is the report that rewrites AI history.

OpenAI analyzed 700M people using ChatGPT.

And the results are nothing like the narrative.

Here's everything you need to know in 3 minutes: Image
"ChatGPT is mainly for work"

Reality check: Only 27% of ChatGPT usage is work-related. 73% is personal. And the gap is widening every month.

The productivity revolution narrative completely misses how people actually use AI. Image
Top 3 use cases:

Forget coding and business automation. Here's what 700M people actually do:

1. Practical Guidance (29%) - Learning, how-to advice, tutoring
2. Seeking Information (24%) - Replacing Google searches
3. Writing (24%) - Editing emails, documents, content

These three account for 77% of ALL ChatGPT usage.Image
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(