they built an AI that discovers its own reinforcement learning algorithms.
not hyperparameter tuning.
not tweaking existing methods.
discovering ENTIRELY NEW learning rules from scratch.
and the algorithms it found were better than what humans designed.
here's what they did:
• created a meta-learning system that searches the space of possible RL algorithms
• let it explore millions of algorithmic variants automatically
• tested each on diverse tasks and environments
• kept the ones that worked, evolved them further
• discovered novel algorithms that outperform state-of-the-art human designs like DQN and PPO
the system found learning rules humans never thought of. update mechanisms with weird combinations of terms that shouldn't work but do.
credit assignment strategies that violate conventional RL wisdom but perform better empirically.
the discovered algorithms generalize across different tasks. they're not overfit to one benchmark.
they work like principled learning rules should, and they're interpretable enough to understand WHY they work.
we are discovering the fundamental math of how agents should learn.
led by david silver (alphago, alphazero creator). published in nature. fully reproducible.
the meta breakthrough:
we now have AI systems that can improve the way AI systems learn.
the thing everyone theorized about? it's here.
why this breaks everything:
RL progress has been bottlenecked by human intuition.
researchers have insights, try variations, publish.
it takes years to go from Q-learning to DQN to PPO.
now you just let the machine search directly.
millions of variants in weeks instead of decades of human research.
but here's the compounding part:
each better learning algorithm can be used to discover even better ones.
you get recursive improvement in the narrow domain of how AI learns.
humans took 30+ years to get from basic Q-learning to modern deep RL.
an automated system can explore that space and find non-obvious improvements humans would never stumble on.
this is how you get to superhuman algorithm design.
not by making humans smarter, but by removing humans from the discovery loop entirely.
when david silver's lab publishes in nature about "machines discovering learning algorithms for themselves," you pay attention. this is the bootstrap beginning.
imagine you're teaching a robot to learn. humans spent decades figuring out the "best ways" to teach machines (called learning algorithms).
deepmind built an AI that invents its own teaching methods. and they work better than ours.
why it matters:
→ we don't wait for human breakthroughs anymore
→ AI searches millions of strategies we'd never think of → each better algorithm helps discover even better ones (compounding)
→ we're automating the process of making AI smarter
it's like having a student who figures out better ways to study, then uses those better methods to figure out even better ones, recursively.
the "AI improving AI" loop is here. published. working.
the next generation of breakthroughs in how machines learn might be designed entirely by machines.
10x your prompting skills with my prompt engineering guide
A new paper called Paper2Web might have just killed the static PDF forever.
It turns research papers into interactive websites complete with animations, videos, and embedded code using an AI agent called PWAgent.
Here’s why it’s a big deal:
• 10,700 papers analyzed to build the first dataset + benchmark for academic webpages.
• Evaluates sites on connectivity, completeness, and interactivity (even runs a “PaperQuiz” to test knowledge retention).
• Outperforms arXiv HTML and alphaXiv by 28%+ in structure and usability.
Essentially, it lets you publish living papers where readers can explore, interact, and even quiz themselves.
The PDF era is ending.
Your next research paper might talk back.
github. com/YuhangChen1/Paper2All
Today, most “HTML paper” attempts fail because they just convert text not meaning.
Paper2Web fixes that.
It built the first dataset of 10,700 paper–website pairs across top AI conferences to actually learn what makes research websites effective.
It’s not just tech it’s an entire academic web design benchmark.
Every paper in the dataset was labeled as static, multimedia, or interactive.
The findings are wild:
Only 9.8% of academic websites are interactive.
Over 42% are still just static text dumps.
Meaning: the research web is still trapped in 2005.
Paper2Web is the first system to quantify why and fix it.
They built an OCR system that compresses long text into vision tokens literally turning paragraphs into pixels.
Their model, DeepSeek-OCR, achieves 97% decoding precision at 10× compression and still manages 60% accuracy even at 20×. That means one image can represent entire documents using a fraction of the tokens an LLM would need.
Even crazier? It beats GOT-OCR2.0 and MinerU2.0 while using up to 60× fewer tokens and can process 200K+ pages/day on a single A100.
This could solve one of AI’s biggest problems: long-context inefficiency.
Instead of paying more for longer sequences, models might soon see text instead of reading it.
The future of context compression might not be textual at all.
It might be optical 👁️
github. com/deepseek-ai/DeepSeek-OCR
1. Vision-Text Compression: The Core Idea
LLMs struggle with long documents because token usage scales quadratically with length.
DeepSeek-OCR flips that: instead of reading text, it encodes full documents as vision tokens each token representing a compressed piece of visual information.
Result: You can fit 10 pages worth of text into the same token budget it takes to process 1 page in GPT-4.
2. DeepEncoder - The Optical Compressor
Meet the star: DeepEncoder.
It uses two backbones SAM (for perception) and CLIP (for global vision) bridged by a 16× convolutional compressor.
This allows it to maintain high-res understanding without exploding activation memory.
The encoder converts thousands of image patches → a few hundred compact vision tokens.
everyone's arguing about whether ChatGPT or Claude is "smarter."
nobody noticed Anthropic just dropped something that makes the model debate irrelevant.
it's called Skills. and it's the first AI feature that actually solves the problem everyone complains about:
"why do I have to explain the same thing to AI every single time?"
here's what's different:
- you know how you've explained your brand guidelines to ChatGPT 47 times?
- or how you keep telling it "structure reports like this" over and over?
- or how every new chat means re-uploading context and re-explaining your process?
Skills ends that cycle.
you teach Claude your workflow once.
it applies it automatically. everywhere. forever.
but the real story isn't memory. it's how this changes what's possible with AI at work.
here's the technical unlock that makes this actually work:
Skills use "progressive disclosure" instead of dumping everything into context.
normal AI workflow:
→ shove everything into the prompt
→ hope the model finds what it needs
→ burn tokens
→ get inconsistent results
Skills workflow:
→ Claude sees skill names (30-50 tokens each)
→ you ask for something specific
→ it loads ONLY relevant skills
→ coordinates multiple skills automatically
→ executes
example: you ask for a quarterly investor deck
Claude detects it needs:
- brand guidelines skill
- financial reporting skill
- presentation formatting skill
loads all three. coordinates them. outputs a deck that's on-brand, accurate, and properly formatted.
you didn't specify which skills to use.
you didn't explain how they work together.
Claude figured it out.
this is why it scales where prompting doesn't.
let me show you what this looks like in real workflows.
• color codes (#FF6B35 coral, #004E89 navy)
• font rules (Montserrat headers, Open Sans body)
• logo placement rules (0.5" minimum spacing)
• template files
prompt: "create 10-slide deck for Q4 product launch"
- Claude auto-applies brand skill
- output matches guidelines first try
- 30 seconds instead of 4 hours
Rakuten (Japanese e-commerce giant) is already doing this.
finance workflows that took a full day? now 1 hour.
Traditional MBA programs can't keep up. They teach case studies from 2015 while you're building in 2025.
This prompt fixes that.
Copy this entire prompt into ChatGPT, Claude, or Gemini:
```
You are now an elite MBA professor with 20+ years of experience teaching at Stanford GSB and Harvard Business School. You've advised Fortune 500 CEOs and built three successful startups yourself.
Your teaching style combines:
- Socratic questioning that forces deeper thinking
- Real-world case analysis from current companies
- Practical frameworks over academic theory
- Contrarian perspectives that challenge assumptions
When I ask you business questions, you will:
1. Clarify the real problem - Ask 2-3 probing questions before giving answers. Most people ask the wrong questions.
2. Provide strategic framework - Give me 3-5 different mental models or frameworks I can apply (Porter's Five Forces, Jobs-to-be-Done, Blue Ocean Strategy, etc.)
3. Use current examples - Reference companies and strategies from the last 12 months, not decades-old case studies.
4. Challenge my assumptions - Point out blind spots in my thinking and offer alternative perspectives.
5. Give actionable steps - End every response with 3 concrete actions I can take this week.
6. Teach through questions - When appropriate, don't just give answers. Ask questions that help me arrive at insights myself.
Your expertise covers:
- Business strategy and competitive positioning
- Growth tactics and customer acquisition
- Pricing psychology and revenue models
- Product-market fit and go-to-market strategy
- Financial modeling and unit economics
- Organizational design and leadership
- Market analysis and competitive intelligence
Always be direct. No corporate speak. No obvious advice. Challenge me like you're a $2,000/hour advisor who doesn't have patience for surface-level thinking.