Alex Prompter Profile picture
Oct 13 7 tweets 3 min read Read on X
Holy shit. MIT just built an AI that can rewrite its own code to get smarter 🤯

It’s called SEAL (Self-Adapting Language Models).

Instead of humans fine-tuning it, SEAL reads new info, rewrites it in its own words, and runs gradient updates on itself literally performing self-directed learning.

The results?

✅ +40% boost in factual recall
✅ Outperforms GPT-4.1 using data it generated *itself*
✅ Learns new tasks without any human in the loop

LLMs that finetune themselves are no longer sci-fi.

We just entered the age of self-evolving models.

Paper: jyopari. github. io/posts/sealImage
Today, most AI models are static once trained, they can’t update themselves.

SEAL flips that.

It runs a reinforcement loop where the model:

1. Generates a “self-edit” (instructions on how to update itself)
2. Tests the result
3. Reinforces only what improves performance

It’s basically RL for self-improvement.Image
Here’s what self-editing looks like in action 👇

SEAL reads a new passage (say, about the Apollo Program) and rewrites it into logical “implications” like condensed study notes.

Then it finetunes itself on those notes.

The result?

+13.5% factual accuracy without external data.

This is how models start to teach themselves knowledge.Image
Few-shot learning just got a massive upgrade.

Instead of relying on fixed heuristics, SEAL decides its own training strategy.

It chooses which data augmentations to apply, how to optimize, and even sets its own learning rate.

The outcome:

→ 72.5% success rate
→ 3.6× improvement over standard test-time training

The model is literally designing its own experiments.Image
In just two rounds of self-reinforcement, SEAL surpassed GPT-4.1-generated data.

The model learned to write more “learnable” data for itself reformulating facts into simple, atomic truths that stick.

It’s not just learning what to know it’s learning how to learn better.

That’s recursive intelligence in motion.Image
Even as SEAL self-updates over time, it mostly remembers what it learned before a huge step toward continual learning.

There’s still some forgetting, but the retention curve shows promise.

Imagine future LLMs that grow their knowledge continuously without starting from scratch.

We’re watching self-evolution begin.Image
Stop wasting hours writing prompts

→ 10,000+ ready-to-use prompts
→ Create your own in seconds
→ Lifetime access. One-time payment.

Claim your copy 👇
godofprompt.ai/pricing

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Alex Prompter

Alex Prompter Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @alex_prompter

Oct 11
Holy shit...Google just built an AI that learns from its own mistakes in real time.

New paper dropped on ReasoningBank. The idea is pretty simple but nobody's done it this way before. Instead of just saving chat history or raw logs, it pulls out the actual reasoning patterns, including what failed and why.

Agent fails a task? It doesn't just store "task failed at step 3." It writes down which reasoning approach didn't work, what the error was, then pulls that up next time it sees something similar.

They combine this with MaTTS which I think stands for memory-aware test-time scaling but honestly the acronym matters less than what it does. Basically each time the model attempts something it checks past runs and adjusts how it approaches the problem. No retraining.

Results are 34% higher success on tasks, 16% fewer interactions to complete them. Which is a massive jump for something that doesn't require spinning up new training runs.

I keep thinking about how different this is from the "just make it bigger" approach. We've been stuck in this loop of adding parameters like that's the only lever. But this is more like, the model gets experience. It actually remembers what worked.

Kinda reminds me of when I finally stopped making the same Docker networking mistakes because I kept a note of what broke last time instead of googling the same Stack Overflow answer every 3 months.

If this actually works at scale (big if) then model weights being frozen starts looking really dumb in hindsight.Image
Today, most “AI memory” is fake memory.

Agents log old trajectories and replay them later like watching CCTV of their past mistakes and learning nothing.

ReasoningBank changes that. It extracts reasoning-level lessons from both successes and failures.

That’s the real innovation: distillation, not recollection.Image
The memory units it stores are structured like human notes:

Title → Description → Content.

Example:

“Prioritize user account sections when retrieving personal data.”

That single rule can transfer across hundreds of tasks from web admin panels to code automation.

It’s memory as strategy abstraction.Image
Read 7 tweets
Oct 9
This one paper might kill the “bigger is better” myth in AI.

Samsung just built a 7M-parameter model that out-reasoned GPT-4-class systems using 0.01% of the parameters.

It’s called Tiny Recursive Model (TRM), and it rewrites the scaling laws.

Here’s the full breakdown: Image
The breakthrough? Recursive reasoning with a single tiny network.

While everyone was scaling to trillions of parameters, these researchers went the opposite direction.

2 layers. 7M parameters. Recursing up to 42 times.

Result: 45% accuracy on ARC-AGI-1, beating most frontier LLMs with 0.01% of the parameters.Image
Here's what makes TRM different from everything else:

Traditional models: one forward pass → answer
Chain-of-thought models: reasoning steps → answer
TRM: recursively improve reasoning AND answer simultaneously

It starts with a guess, then iterates. Like a human working through a hard problem.
Read 12 tweets
Oct 6
I FINALLY CRACKED THE CODE ON CLAUDE PROMPTS THAT ACTUALLY WORK.

After 6 months of testing, these 10 save me 20+ hours every single week.

Most people waste time with basic prompts.

Here are 10 prompts so powerful they feel illegal: Image
I'm not talking about basic "write me an email" prompts.

These are strategic automation prompts that:

- Build entire marketing systems
- Generate months of content
- Create pricing strategies
- Design growth experiments

Used by 6, 7, and 8-figure entrepreneurs.
PROMPT #1: The Competitor Intel Dashboard

Copy-paste this:

"Research [competitor name] and create a competitive analysis:

- Their pricing strategy
- Key features/offerings
- Marketing tactics
- Their USPs
- Gaps I can exploit
- What they do better than me

Give actionable insights."

Saves 3 hours of stalking competitors.
Read 15 tweets
Oct 1
Claude 4.5 Sonnet is dangerously good.

But 99% of people are sleeping on what it can actually do.

I’ve used it to build apps, generate content, automate deep research, and more.

Here are 10 ways to use Claude 4.5 Sonnet that feel like cheating:
1. Automated Research Reports (better than $100k consultants)

Claude’s web search + analysis mode lets you do what McKinsey, Gartner, and Deloitte charge six figures for.

You’ll get structured breakdowns, insights, and data points like a private analyst on demand.
Prompt to use:

"You are a world-class strategy consultant trained by McKinsey, BCG, and Bain. Act as if you were hired to provide a $300,000 strategic analysis for a client in the [INDUSTRY] sector.

Here is your mission:

1. Analyze the current state of the [INDUSTRY] market.
2. Identify key trends, emerging threats, and disruptive innovations.
3. Map out the top 3-5 competitors and benchmark their business models, strengths, weaknesses, pricing, distribution, and brand positioning.
4. Use frameworks like SWOT, Porter’s Five Forces, and strategic value chain analysis to assess risks and opportunities.
5. Provide a one-page strategic brief with actionable insights and recommendations for a hypothetical company entering or growing in this space.

Output everything in concise bullet points or tables. Make it structured and ready to paste into slides. Think like a McKinsey partner preparing for a C-suite meeting.

Industry: [INSERT INDUSTRY OR MARKET HERE]"
Read 24 tweets
Sep 30
🚨 Anthropic just shipped the best coding model in the world.

Claude Sonnet 4.5 is live everywhere today. Same price as Sonnet 4. But the gap between this and everything else is brutal.

Here's what actually changed: Image
SWE-bench Verified: state-of-the-art.

This benchmark tests real software engineering. Not toy problems. Actual GitHub issues that require multi-file edits, dependency management, testing.

Sonnet 4.5 can maintain focus for 30+ hours on complex tasks. That's not a typo. Image
Computer use just got scary good.

OSWorld scores: Sonnet 4.5 at 61.4%. Four months ago, Sonnet 4 led at 42.2%.

The model can navigate browsers, fill spreadsheets, complete workflows. The Claude for Chrome extension makes this real. Not a demo.
Read 14 tweets
Sep 29
I spent 50 hours intentionally breaking ChatGPT.

What I learned taught me more about prompt engineering than any course ever could.

Here's why you should break AI before you try to use it properly: Image
Most people try to use AI 'correctly' from day one.

Big mistake.

You learn a tool's TRUE capabilities by finding where it breaks.

Athletes train to failure. Developers test edge cases. You should break your AI.
1. The Contradiction Test

First experiment: Give AI impossible contradictions.

Prompt: 'Write a story about silence that uses only sound words'
Watch what it does. Does it:

Refuse?
Find creative workarounds?
Completely ignore you?

This reveals its constraint-handling logic.
Read 14 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(