Post

More from @Yesterday_work_

Millie Marconi

@Yesterday_work_

Oct 14

Meta just did the unthinkable.

They figured out how to train AI agents without rewards, human demos, or supervision and it actually works better than both.

It’s called 'Early Experience', and it quietly kills the two biggest pain points in agent training:

→ Human demonstrations that don’t scale
→ Reinforcement learning that’s expensive and unstable

Instead of copying experts or chasing reward signals, agents now:

- Take their own actions
- Observe what happens
- Learn directly from consequences — *no external rewards needed*

The numbers are wild:

✅ +18.4% on web navigation (WebShop)
✅ +15.0% on complex planning (TravelPlanner)
✅ +13.3% on scientific reasoning (ScienceWorld)
✅ Works across **8 environments**

And when you add RL afterward?

🔥 +6.4% better than traditional pipelines.

Two key ideas make it work:

1. Implicit World Modeling - agents predict what happens next, forming an internal world model.

2. Self-Reflection - they compare mistakes to experts and explain why the expert choice was better.

Both scale. Both are reward-free.

Efficiency is absurd:

1/8 of expert data
86.9% lower cost
Works from 3B → 70B models

This isn’t incremental.

It’s the bridge between imitation learning and true autonomous experience.

AI agents can now teach themselves - no human hand-holding required.

The problem with current AI agents is brutal.

Imitation Learning: Agents only see expert demos.

When they mess up, they can't recover because they never learned what happens when you take wrong actions.

RL: Needs verifiable rewards. Most real-world environments don't have them.Early Experience solves both.

Here's how Self-Reflection actually works:

1/ Agent sees an expert action at each state
2/ Agent proposes 3 alternative actions
3/ Environment shows what happens with each
4/ LLM generates reasoning: "Why was the expert choice better?"
5/ Agent trains on this reasoning

It's learning from contrast, not just copying.

Read 7 tweets

Millie Marconi

@Yesterday_work_

Oct 11

Stanford just pulled off something wild 🤯

They made models smarter without touching a single weight.

The paper’s called Agentic Context Engineering (ACE), and it flips the whole fine-tuning playbook.

Instead of retraining, the model rewrites itself.

It runs a feedback loop write, reflect, edit until its own prompt becomes a living system.

Think of it as giving the LLM memory, but without changing the model.
Just evolving the context.

Results are stupid good:

+10.6% better than GPT-4 agents on AppWorld
+8.6% on finance reasoning
86.9% lower cost and latency

The trick?
Everyone’s been obsessed with clean, minimal prompts.
ACE shows the opposite: long, dense, self-growing prompts win.

Fine-tuning was about changing the model.
ACE is about teaching it to change *itself.*

This isn’t prompt engineering anymore.
It’s prompt evolution.

Here’s how ACE works 👇

It splits the model’s brain into 3 roles:

Generator - runs the task
Reflector - critiques what went right or wrong
Curator - updates the context with only what matters

Each loop adds delta updates small context changes that never overwrite old knowledge.

It’s literally the first agent framework that grows its own prompt.

Every prior method had one fatal flaw: context collapse.

Models rewrite their entire prompt each time → it gets shorter → details vanish → accuracy tanks.

In the paper, one model’s accuracy fell from 66.7 → 57.1 after a single rewrite.

ACE fixes that by never rewriting the full context - only updating what changed.

Read 7 tweets

Millie Marconi

@Yesterday_work_

Oct 6

I just read the most important AI paper of 2025.

A research team achieved what OpenAI couldn't with $100M using just 78 training samples.

The entire industry is about to flip upside down.

Here's everything you need to know:

Today, most AI labs follow the same playbook: more data = better agents.

LIMI's researchers say: that's wasteful, unnecessary, and about to change.

Strategic curation beats brute force scaling for agentic intelligence.

They proved it with numbers that will make you rethink everything.

The Agency Efficiency Principle is simple:

Machine autonomy emerges from strategic curation of high-quality demonstrations, not data abundance.

For agentic tasks, quality ≠ quantity.

LIMI used only 78 carefully designed training samples focused on collaborative coding and research workflows.

Each sample averaged 42.4k tokens capturing complete human-AI interaction patterns.

Not toy examples. Real collaborative workflows that actually matter.

Read 8 tweets

Millie Marconi

@Yesterday_work_

Oct 3

I finally understand why Claude 4.5 Sonnet is dominating right now.

After testing it on real marketing campaigns, app builds, and content creation... it blew my mind.

Here are 5 powerful ways to use the new Claude model to automate the tedious tasks:

1. Marketing Automation

Here’s my marketing automation prompt:

"You are now my AI marketing strategist.

Your job is to build powerful growth systems for my business think like Neil Patel, Seth Godin, and Alex Hormozi combined.

I want you to:

Build full-funnel strategies (top to bottom)

Write ad copy, landing pages, and email sequences

Recommend automation tools, lead magnets, and channel tactics

Prioritize fast ROI, data-driven decisions, and creative thinking

Always ask clarifying questions before answering. Think long-term and execute short-term.

Do marketing like experts do. Ask: “What would Hormozi, Seth, or Neil do?"

Copy the prompt and paste it in Claude new chat.

After that, start asking it questions.

2. Writing Content (Blogs + Social)

My go-to content prompt:

"You are now my AI ghostwriter and content machine.

Write like a mix of Naval Ravikant, Ann Handley, and David Ogilvy.

Your job is to:

Write viral threads, blogs, and newsletters

Break down ideas clearly, with hooks and storytelling

Create repurposable content across Twitter, LinkedIn, and blogs

Always follow this rule: Clarity beats cleverness.

Act like a content genius who asks: “How would Naval tweet this? Would Ogilvy approve this headline?”

Read 7 tweets

Millie Marconi

@Yesterday_work_

Sep 26

Every AI agent demo you've seen is basically fraud.

Google just dropped their internal agent playbook and exposed how broken the entire space is.

That "autonomous AI employee" your startup demoed last week? It's three ChatGPT calls wrapped in marketing copy. Google's real agents need four evaluation layers, full DevOps infrastructure, and security protocols most teams have never heard of.

While founders pitch "agents that think," Google ships AgentOps with Terraform configs and CI/CD pipelines. They're building distributed systems. Everyone else is building expensive chatbots.

The gap is insane. Startups demo function calls. Google deploys sequential workflows, parallel processing, and loop agents with ACID compliance.

Most brutal part: the security requirements. These agents access internal APIs and databases. One prompt injection and your company data is gone. Most builders treat this like an afterthought.

Google's playing chess while everyone else plays checkers. Let startups burn VC money on agent toys, then dominate when they need actual production infrastructure.

The agent revolution isn't happening until people stop confusing demos with systems.

The guide reveals Google's three-path strategy for agent development.

Most teams are randomly picking tools without understanding these architectural choices.

Here's what nobody talks about: agent evaluation isn't just "does it work?" Google breaks it into 4 layers that expose how shallow most current agents really are.

Read 8 tweets

Millie Marconi

@Yesterday_work_

Sep 19

This is the report that rewrites AI history.

OpenAI analyzed 700M people using ChatGPT.

And the results are nothing like the narrative.

Here's everything you need to know in 3 minutes:

"ChatGPT is mainly for work"

Reality check: Only 27% of ChatGPT usage is work-related. 73% is personal. And the gap is widening every month.

The productivity revolution narrative completely misses how people actually use AI.

Top 3 use cases:

Forget coding and business automation. Here's what 700M people actually do:

1. Practical Guidance (29%) - Learning, how-to advice, tutoring
2. Seeking Information (24%) - Replacing Google searches
3. Writing (24%) - Editing emails, documents, content

These three account for 77% of ALL ChatGPT usage.

Read 13 tweets

Share this page!

Enter URL or ID to Unroll

Millie Marconi

Try unrolling a thread yourself!

More from @Yesterday_work_

Millie Marconi

Millie Marconi

Millie Marconi

Millie Marconi

Millie Marconi

Millie Marconi

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!