BAIDU JUST DROPPED AN ABSOLUTE GAME-CHANGER FOR DOCUMENT AI
It’s called `Unlimited-OCR`, and it can literally transcribe an entire book in a single pass 🤯
Most vision models read a single page, forget the context, and eventually hit a wall where performance degrades and inference slows down.
@Baidu_Inc built this on top of `DeepSeek OCR` but fixed the memory bloat with a single change to the attention mechanism.
The design mimics how a human hand-copies a book.
Instead of trying to hold the entire book in active memory, each token only looks at the current page plus the last 128 words.
This creates a sliding window that keeps memory usage completely flat, no matter how long the output gets.
The architectural shift delivers three massive upgrades for document parsing:
→ A fixed memory footprint
→ Steady generation speed on massive documents
→ The ability to process dozens of pages per pass
The numbers back it up.
Unlimited-OCR scores 93% on standard parsing benchmarks, beating the older baseline by a full six points.
Even when pushed past 40 pages, the error rate stays under 0.11.
More importantly, it maintains a flat speed curve where older models suffered a 35% slowdown.
🚨 @Karpathy predicted the power of the "LLM Wiki." Google just formalized it.
Meet Open Knowledge Format (OKF): a vendor-neutral standard for giving foundation models the curated context they need.
I can genuinely see this replacing Notion, Obsidian, or traditional wikis for developer teams, and the reason comes down to bookkeeping.
Traditional wikis fail because humans inevitably abandon the tedious work of updating them.
As Andrej Karpathy pointed out recently, LLMs don't get bored.
They don't forget to update a cross-reference, and they can touch 15 files in a single pass.
OKF standardizes the interoperability layer so agents can actually do that heavy lifting autonomously.
Because the format is minimally opinionated, it doesn't dictate what you write, it just dictates how it's structured. You get:
→ Human-readable documents that live right alongside your code in version control
→ Cross-links that map out complex entity relationships without needing a graph database
→ A system that survives moving between different tools and organizations
There is no complex compression scheme.
No central registry.
If you can cat a file, you can read it.
If you can git clone a repo, you can deploy it.
This is how we stop rebuilding context pipelines from scratch every time a new model drops.
🚨 Karpathy’s new set-up is the ultimate self-improving second brain, and it takes zero manual editing 🤯
It acts as a living AI knowledge base that actually heals itself.
Let me break it down.
Instead of relying on complex RAG, the LLM pulls raw research directly into an @Obsidian Markdown wiki. It completely takes over:
✦ Index creation
✦ System linting
✦ Native Q&A routing
The core process is beautifully simple:
→ You dump raw sources into a folder
→ The LLM auto-compiles an indexed .md wiki
→ You ask complex questions
→ It generates outputs (Marp slides, matplotlib plots) and files them back in
The big-picture implication of this is just wild.
When agents maintain their own memory layer, they don’t need massive, expensive context limits.
They really just need two things:
→ Clean file organization
→ The ability to query their own indexes
Forget stuffing everything into one giant prompt.
This approach is way cheaper, highly scalable... and 100% inspectable!
Wow. Insanely fast turnaround from @himanshustwts!
A full breakdown of @karpathy’s self-improving wiki framework,
walking through every stage from ingestion to what comes next 👀
@himanshustwts @karpathy Omar took a v. similar approach with @Obsidian