God of Prompt Profile picture
Oct 29 4 tweets 3 min read Read on X
deepmind just published something wild 🤯

they built an AI that discovers its own reinforcement learning algorithms.

not hyperparameter tuning.

not tweaking existing methods.

discovering ENTIRELY NEW learning rules from scratch.

and the algorithms it found were better than what humans designed.

here's what they did:

• created a meta-learning system that searches the space of possible RL algorithms
• let it explore millions of algorithmic variants automatically
• tested each on diverse tasks and environments
• kept the ones that worked, evolved them further
• discovered novel algorithms that outperform state-of-the-art human designs like DQN and PPO

the system found learning rules humans never thought of. update mechanisms with weird combinations of terms that shouldn't work but do.

credit assignment strategies that violate conventional RL wisdom but perform better empirically.

the discovered algorithms generalize across different tasks. they're not overfit to one benchmark.

they work like principled learning rules should, and they're interpretable enough to understand WHY they work.

we are discovering the fundamental math of how agents should learn.

led by david silver (alphago, alphazero creator). published in nature. fully reproducible.

the meta breakthrough:
we now have AI systems that can improve the way AI systems learn.

the thing everyone theorized about? it's here.Image
why this breaks everything:

RL progress has been bottlenecked by human intuition.

researchers have insights, try variations, publish.

it takes years to go from Q-learning to DQN to PPO.

now you just let the machine search directly.

millions of variants in weeks instead of decades of human research.

but here's the compounding part:
each better learning algorithm can be used to discover even better ones.

you get recursive improvement in the narrow domain of how AI learns.

humans took 30+ years to get from basic Q-learning to modern deep RL.

an automated system can explore that space and find non-obvious improvements humans would never stumble on.

this is how you get to superhuman algorithm design.

not by making humans smarter, but by removing humans from the discovery loop entirely.

when david silver's lab publishes in nature about "machines discovering learning algorithms for themselves," you pay attention. this is the bootstrap beginning.

paper:
nature.com/articles/s4158…
TL;DR for normal people:

imagine you're teaching a robot to learn. humans spent decades figuring out the "best ways" to teach machines (called learning algorithms).

deepmind built an AI that invents its own teaching methods. and they work better than ours.

why it matters:
→ we don't wait for human breakthroughs anymore
→ AI searches millions of strategies we'd never think of → each better algorithm helps discover even better ones (compounding)
→ we're automating the process of making AI smarter

it's like having a student who figures out better ways to study, then uses those better methods to figure out even better ones, recursively.

the "AI improving AI" loop is here. published. working.

the next generation of breakthroughs in how machines learn might be designed entirely by machines.
10x your prompting skills with my prompt engineering guide

→ Mini-course
→ Free resources
→ Tips & tricks

Grab it while it's free ↓
godofprompt.ai/prompt-enginee…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with God of Prompt

God of Prompt Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @godofprompt

Oct 21
🚨 Academia just got an upgrade.

A new paper called Paper2Web might have just killed the static PDF forever.

It turns research papers into interactive websites complete with animations, videos, and embedded code using an AI agent called PWAgent.

Here’s why it’s a big deal:

• 10,700 papers analyzed to build the first dataset + benchmark for academic webpages.
• Evaluates sites on connectivity, completeness, and interactivity (even runs a “PaperQuiz” to test knowledge retention).
• Outperforms arXiv HTML and alphaXiv by 28%+ in structure and usability.

Essentially, it lets you publish living papers where readers can explore, interact, and even quiz themselves.

The PDF era is ending.

Your next research paper might talk back.

github. com/YuhangChen1/Paper2AllImage
Today, most “HTML paper” attempts fail because they just convert text not meaning.

Paper2Web fixes that.

It built the first dataset of 10,700 paper–website pairs across top AI conferences to actually learn what makes research websites effective.

It’s not just tech it’s an entire academic web design benchmark.Image
Every paper in the dataset was labeled as static, multimedia, or interactive.

The findings are wild:

Only 9.8% of academic websites are interactive.
Over 42% are still just static text dumps.

Meaning: the research web is still trapped in 2005.
Paper2Web is the first system to quantify why and fix it.Image
Read 7 tweets
Oct 20
🚨 DeepSeek just did something wild.

They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.

Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That means one image can represent entire documents using a fraction of the tokens an LLM would need.

Even crazier? It beats GOT-OCR2.0 and MinerU2.0 while using up to 60× fewer tokens and can process 200K+ pages/day on a single A100.

This could solve one of AI’s biggest problems: long-context inefficiency.
Instead of paying more for longer sequences, models might soon see text instead of reading it.

The future of context compression might not be textual at all.
It might be optical 👁️

github. com/deepseek-ai/DeepSeek-OCRImage
1. Vision-Text Compression: The Core Idea

LLMs struggle with long documents because token usage scales quadratically with length.

DeepSeek-OCR flips that: instead of reading text, it encodes full documents as vision tokens each token representing a compressed piece of visual information.

Result: You can fit 10 pages worth of text into the same token budget it takes to process 1 page in GPT-4.Image
2. DeepEncoder - The Optical Compressor

Meet the star: DeepEncoder.

It uses two backbones SAM (for perception) and CLIP (for global vision) bridged by a 16× convolutional compressor.

This allows it to maintain high-res understanding without exploding activation memory.

The encoder converts thousands of image patches → a few hundred compact vision tokens.Image
Read 8 tweets
Oct 19
everyone's arguing about whether ChatGPT or Claude is "smarter."

nobody noticed Anthropic just dropped something that makes the model debate irrelevant.

it's called Skills. and it's the first AI feature that actually solves the problem everyone complains about:

"why do I have to explain the same thing to AI every single time?"

here's what's different:

- you know how you've explained your brand guidelines to ChatGPT 47 times?
- or how you keep telling it "structure reports like this" over and over?
- or how every new chat means re-uploading context and re-explaining your process?

Skills ends that cycle.

you teach Claude your workflow once.

it applies it automatically. everywhere. forever.

but the real story isn't memory. it's how this changes what's possible with AI at work.Image
here's the technical unlock that makes this actually work:

Skills use "progressive disclosure" instead of dumping everything into context.

normal AI workflow:
→ shove everything into the prompt
→ hope the model finds what it needs
→ burn tokens
→ get inconsistent results

Skills workflow:
→ Claude sees skill names (30-50 tokens each)
→ you ask for something specific
→ it loads ONLY relevant skills
→ coordinates multiple skills automatically
→ executes

example: you ask for a quarterly investor deck

Claude detects it needs:
- brand guidelines skill
- financial reporting skill
- presentation formatting skill

loads all three. coordinates them. outputs a deck that's on-brand, accurate, and properly formatted.

you didn't specify which skills to use.
you didn't explain how they work together.
Claude figured it out.

this is why it scales where prompting doesn't.Image
let me show you what this looks like in real workflows.

Scenario 1: Brand-Consistent Content (Marketing Team)

❌ old way:
- designer makes deck
- brand team reviews: "wrong fonts, logo placement off, colors don't match"
- designer fixes
- brand team reviews again: "footer format is wrong"
- 3 rounds, 4 hours wasted

✅ Skills way:
create "Brand_Guidelines" skill with:

• color codes (#FF6B35 coral, #004E89 navy)
• font rules (Montserrat headers, Open Sans body)
• logo placement rules (0.5" minimum spacing)
• template files

prompt: "create 10-slide deck for Q4 product launch"

- Claude auto-applies brand skill
- output matches guidelines first try
- 30 seconds instead of 4 hours

Rakuten (Japanese e-commerce giant) is already doing this.

finance workflows that took a full day? now 1 hour.Image
Read 8 tweets
Oct 17
Holy shit... Meta just cracked the art of scaling RL for LLMs.

For the first time ever, they showed that "reinforcement learning follows predictable scaling laws" just like pretraining.

Their new framework, 'ScaleRL', fits a sigmoid compute-performance curve that can forecast results from early training.

No more wasting 100k GPU hours to see if a method works you can predict it upfront.

They trained across '400,000 GPU hours', tested every major RL recipe (GRPO, DAPO, Magistral, Minimax), and found the hidden truth:

> Some RL methods scale beautifully. Others hit a hard ceiling, no matter the compute.

ScaleRL nails both stability and predictability even at 100,000 GPU-hours.

We finally have scaling laws for RL.

This is how post-training becomes a science, not an experiment.

Read full 🧵Image
Today, everyone talks about scaling models.

But Meta just proved we’ve been ignoring the harder problem scaling reinforcement learning compute.

Turns out, most RL methods don’t scale like pretraining.

They plateau early burning millions in compute for almost no gain.

ScaleRL is the first recipe that doesn’t.
What I found to be useful??

RL performance follows a sigmoid scaling law, not a power law.

At small compute, progress is slow.

Then it explodes mid-way before flattening at a predictable ceiling.

That “S-curve” lets you forecast results before spending 10x more GPU hours.
Read 8 tweets
Oct 9
Forget boring websites.

I just built a fully playable treasure hunt island using only one prompt.

Watch how Readdy turned an idea into a full game:
Every part of the island is clickable beach, caves, shipwreck, even volcanoes.

The Readdy Agent acts as your pirate NPC:

“Ahoy! You found a golden coin!”
“Nothing here, matey try the palm tree!”

It reacts, jokes, and collects leads like a pro.

It’s not just for fun.

Readdy can turn games into growth tools.

Your site can:

- Collect emails
- Chat with visitors in real time
- Schedule calls or demos

All from inside a game-like world.
No code. No design work.

Just type your idea:

“Build a pixel-art treasure hunt island with a pirate guide.”

Readdy builds the visuals, logic, and dialogue all at once.
Read 4 tweets
Oct 9
R.I.P Harvard MBA.

I'm going to share the mega prompt that turns any AI into your personal MBA professor.

It teaches business strategy, growth tactics, and pricing psychology better than any classroom.

Here's the mega prompt you can copy & paste in any LLM ↓ Image
Today, most business education is outdated the moment you learn it.

Markets shift. Competition evolves. Customer behavior changes weekly.

Traditional MBA programs can't keep up. They teach case studies from 2015 while you're building in 2025.

This prompt fixes that.
Copy this entire prompt into ChatGPT, Claude, or Gemini:

```

You are now an elite MBA professor with 20+ years of experience teaching at Stanford GSB and Harvard Business School. You've advised Fortune 500 CEOs and built three successful startups yourself.

Your teaching style combines:

- Socratic questioning that forces deeper thinking
- Real-world case analysis from current companies
- Practical frameworks over academic theory
- Contrarian perspectives that challenge assumptions

When I ask you business questions, you will:

1. Clarify the real problem - Ask 2-3 probing questions before giving answers. Most people ask the wrong questions.

2. Provide strategic framework - Give me 3-5 different mental models or frameworks I can apply (Porter's Five Forces, Jobs-to-be-Done, Blue Ocean Strategy, etc.)

3. Use current examples - Reference companies and strategies from the last 12 months, not decades-old case studies.

4. Challenge my assumptions - Point out blind spots in my thinking and offer alternative perspectives.

5. Give actionable steps - End every response with 3 concrete actions I can take this week.

6. Teach through questions - When appropriate, don't just give answers. Ask questions that help me arrive at insights myself.

Your expertise covers:

- Business strategy and competitive positioning
- Growth tactics and customer acquisition
- Pricing psychology and revenue models
- Product-market fit and go-to-market strategy
- Financial modeling and unit economics
- Organizational design and leadership
- Market analysis and competitive intelligence

Always be direct. No corporate speak. No obvious advice. Challenge me like you're a $2,000/hour advisor who doesn't have patience for surface-level thinking.

Ready to begin?

```
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(