Post

More from @alex_prompter

Alex Prompter

@alex_prompter

Oct 11

Holy shit...Google just built an AI that learns from its own mistakes in real time.

New paper dropped on ReasoningBank. The idea is pretty simple but nobody's done it this way before. Instead of just saving chat history or raw logs, it pulls out the actual reasoning patterns, including what failed and why.

Agent fails a task? It doesn't just store "task failed at step 3." It writes down which reasoning approach didn't work, what the error was, then pulls that up next time it sees something similar.

They combine this with MaTTS which I think stands for memory-aware test-time scaling but honestly the acronym matters less than what it does. Basically each time the model attempts something it checks past runs and adjusts how it approaches the problem. No retraining.

Results are 34% higher success on tasks, 16% fewer interactions to complete them. Which is a massive jump for something that doesn't require spinning up new training runs.

I keep thinking about how different this is from the "just make it bigger" approach. We've been stuck in this loop of adding parameters like that's the only lever. But this is more like, the model gets experience. It actually remembers what worked.

Kinda reminds me of when I finally stopped making the same Docker networking mistakes because I kept a note of what broke last time instead of googling the same Stack Overflow answer every 3 months.

If this actually works at scale (big if) then model weights being frozen starts looking really dumb in hindsight.

Today, most “AI memory” is fake memory.

Agents log old trajectories and replay them later like watching CCTV of their past mistakes and learning nothing.

ReasoningBank changes that. It extracts reasoning-level lessons from both successes and failures.

That’s the real innovation: distillation, not recollection.

The memory units it stores are structured like human notes:

Title → Description → Content.

Example:

“Prioritize user account sections when retrieving personal data.”

That single rule can transfer across hundreds of tasks from web admin panels to code automation.

It’s memory as strategy abstraction.

Read 7 tweets

Alex Prompter

@alex_prompter

Oct 9

This one paper might kill the “bigger is better” myth in AI.

Samsung just built a 7M-parameter model that out-reasoned GPT-4-class systems using 0.01% of the parameters.

It’s called Tiny Recursive Model (TRM), and it rewrites the scaling laws.

Here’s the full breakdown:

The breakthrough? Recursive reasoning with a single tiny network.

While everyone was scaling to trillions of parameters, these researchers went the opposite direction.

2 layers. 7M parameters. Recursing up to 42 times.

Result: 45% accuracy on ARC-AGI-1, beating most frontier LLMs with 0.01% of the parameters.

Here's what makes TRM different from everything else:

Traditional models: one forward pass → answer
Chain-of-thought models: reasoning steps → answer
TRM: recursively improve reasoning AND answer simultaneously

It starts with a guess, then iterates. Like a human working through a hard problem.

Read 12 tweets

Alex Prompter

@alex_prompter

Oct 6

I FINALLY CRACKED THE CODE ON CLAUDE PROMPTS THAT ACTUALLY WORK.

After 6 months of testing, these 10 save me 20+ hours every single week.

Most people waste time with basic prompts.

Here are 10 prompts so powerful they feel illegal:

I'm not talking about basic "write me an email" prompts.

These are strategic automation prompts that:

- Build entire marketing systems
- Generate months of content
- Create pricing strategies
- Design growth experiments

Used by 6, 7, and 8-figure entrepreneurs.

PROMPT #1: The Competitor Intel Dashboard

Copy-paste this:

"Research [competitor name] and create a competitive analysis:

- Their pricing strategy
- Key features/offerings
- Marketing tactics
- Their USPs
- Gaps I can exploit
- What they do better than me

Give actionable insights."

Saves 3 hours of stalking competitors.

Read 15 tweets

Alex Prompter

@alex_prompter

Oct 1

Claude 4.5 Sonnet is dangerously good.

But 99% of people are sleeping on what it can actually do.

I’ve used it to build apps, generate content, automate deep research, and more.

Here are 10 ways to use Claude 4.5 Sonnet that feel like cheating:

1. Automated Research Reports (better than $100k consultants)

Claude’s web search + analysis mode lets you do what McKinsey, Gartner, and Deloitte charge six figures for.

You’ll get structured breakdowns, insights, and data points like a private analyst on demand.

Prompt to use:

"You are a world-class strategy consultant trained by McKinsey, BCG, and Bain. Act as if you were hired to provide a $300,000 strategic analysis for a client in the [INDUSTRY] sector.

Here is your mission:

1. Analyze the current state of the [INDUSTRY] market.
2. Identify key trends, emerging threats, and disruptive innovations.
3. Map out the top 3-5 competitors and benchmark their business models, strengths, weaknesses, pricing, distribution, and brand positioning.
4. Use frameworks like SWOT, Porter’s Five Forces, and strategic value chain analysis to assess risks and opportunities.
5. Provide a one-page strategic brief with actionable insights and recommendations for a hypothetical company entering or growing in this space.

Output everything in concise bullet points or tables. Make it structured and ready to paste into slides. Think like a McKinsey partner preparing for a C-suite meeting.

Industry: [INSERT INDUSTRY OR MARKET HERE]"

Read 24 tweets

Alex Prompter

@alex_prompter

Sep 30

🚨 Anthropic just shipped the best coding model in the world.

Claude Sonnet 4.5 is live everywhere today. Same price as Sonnet 4. But the gap between this and everything else is brutal.

Here's what actually changed:

SWE-bench Verified: state-of-the-art.

This benchmark tests real software engineering. Not toy problems. Actual GitHub issues that require multi-file edits, dependency management, testing.

Sonnet 4.5 can maintain focus for 30+ hours on complex tasks. That's not a typo.

Computer use just got scary good.

OSWorld scores: Sonnet 4.5 at 61.4%. Four months ago, Sonnet 4 led at 42.2%.

The model can navigate browsers, fill spreadsheets, complete workflows. The Claude for Chrome extension makes this real. Not a demo.

Read 14 tweets

Alex Prompter

@alex_prompter

Sep 29

I spent 50 hours intentionally breaking ChatGPT.

What I learned taught me more about prompt engineering than any course ever could.

Here's why you should break AI before you try to use it properly:

Most people try to use AI 'correctly' from day one.

Big mistake.

You learn a tool's TRUE capabilities by finding where it breaks.

Athletes train to failure. Developers test edge cases. You should break your AI.

1. The Contradiction Test

First experiment: Give AI impossible contradictions.

Prompt: 'Write a story about silence that uses only sound words'
Watch what it does. Does it:

Refuse?
Find creative workarounds?
Completely ignore you?

This reveals its constraint-handling logic.

Read 14 tweets

Share this page!

Enter URL or ID to Unroll

Alex Prompter

Try unrolling a thread yourself!

More from @alex_prompter

Alex Prompter

Alex Prompter

Alex Prompter

Alex Prompter

Alex Prompter

Alex Prompter

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!