Post

@Steve8708

@code

@samijaber_

More from @DataChaz

Charly Wargnier

@DataChaz

Jun 24

BAIDU JUST DROPPED AN ABSOLUTE GAME-CHANGER FOR DOCUMENT AI

It’s called `Unlimited-OCR`, and it can literally transcribe an entire book in a single pass 🤯

Most vision models read a single page, forget the context, and eventually hit a wall where performance degrades and inference slows down.

@Baidu_Inc built this on top of `DeepSeek OCR` but fixed the memory bloat with a single change to the attention mechanism.

The design mimics how a human hand-copies a book.

Instead of trying to hold the entire book in active memory, each token only looks at the current page plus the last 128 words.

This creates a sliding window that keeps memory usage completely flat, no matter how long the output gets.

The architectural shift delivers three massive upgrades for document parsing:
→ A fixed memory footprint
→ Steady generation speed on massive documents
→ The ability to process dozens of pages per pass

The numbers back it up.

Unlimited-OCR scores 93% on standard parsing benchmarks, beating the older baseline by a full six points.

Even when pushed past 40 pages, the error rate stays under 0.11.

More importantly, it maintains a flat speed curve where older models suffered a 35% slowdown.

Free and open-source.

Repo, weights and paper in 🧵↓

Repo → github.com/baidu/Unlimite…

Weights → huggingface.co/baidu/Unlimite…

Read 5 tweets

Charly Wargnier

@DataChaz

Jun 13

🚨 @Karpathy predicted the power of the "LLM Wiki." Google just formalized it.

Meet Open Knowledge Format (OKF): a vendor-neutral standard for giving foundation models the curated context they need.

I can genuinely see this replacing Notion, Obsidian, or traditional wikis for developer teams, and the reason comes down to bookkeeping.

Traditional wikis fail because humans inevitably abandon the tedious work of updating them.

As Andrej Karpathy pointed out recently, LLMs don't get bored.

They don't forget to update a cross-reference, and they can touch 15 files in a single pass.

OKF standardizes the interoperability layer so agents can actually do that heavy lifting autonomously.

Because the format is minimally opinionated, it doesn't dictate what you write, it just dictates how it's structured. You get:
→ Human-readable documents that live right alongside your code in version control
→ Cross-links that map out complex entity relationships without needing a graph database
→ A system that survives moving between different tools and organizations

There is no complex compression scheme.

No central registry.

If you can cat a file, you can read it.

If you can git clone a repo, you can deploy it.

This is how we stop rebuilding context pipelines from scratch every time a new model drops.

Announcement + spec file in 🧵↓

Google's blog post: cloud.google.com/blog/products/…

Spec file here: github.com/GoogleCloudPla…

Read 4 tweets

Charly Wargnier

@DataChaz

May 17

🚨 New AI guides drop every single day, yet these 9 official guides from OpenAI, Google, and Anthropic are still the definitive foundation you need.

Bookmark these: 🧵 ↓

1/ 601 GenAI Use Cases – by @Google

The enterprise AI playbook keeps growing!

There are over 600 use cases inside this gigantic guide from Google! 🔥

→ cloud.google.com/transform/101-…

cloud.google.com/transform/101-…

2/ Agents Companion – by @Kaggle

Here's a great playbook filled with tools and reference material for agent builders.

→ kaggle.com/whitepaper-age…

Read 11 tweets

Charly Wargnier

@DataChaz

Apr 3

🚨 Karpathy’s new set-up is the ultimate self-improving second brain, and it takes zero manual editing 🤯

It acts as a living AI knowledge base that actually heals itself.

Let me break it down.

Instead of relying on complex RAG, the LLM pulls raw research directly into an @Obsidian Markdown wiki. It completely takes over:

✦ Index creation
✦ System linting
✦ Native Q&A routing

The core process is beautifully simple:

→ You dump raw sources into a folder
→ The LLM auto-compiles an indexed .md wiki
→ You ask complex questions
→ It generates outputs (Marp slides, matplotlib plots) and files them back in

The big-picture implication of this is just wild.

When agents maintain their own memory layer, they don’t need massive, expensive context limits.

They really just need two things:

→ Clean file organization
→ The ability to query their own indexes

Forget stuffing everything into one giant prompt.

This approach is way cheaper, highly scalable... and 100% inspectable!

Wow. Insanely fast turnaround from @himanshustwts!

A full breakdown of @karpathy’s self-improving wiki framework,

walking through every stage from ingestion to what comes next 👀

https://x.com/omarsar0/status/2039844072748204246

@himanshustwts @karpathy Omar took a v. similar approach with @Obsidian

You can check it out here:

https://x.com/omarsar0/status/2039844072748204246

Read 5 tweets

Charly Wargnier

@DataChaz

Mar 19

With Voicebox, @ElevenLabs just lost its moat.

→ Powered by Alibaba's Qwen3-TTS for near-perfect cloning
→ Ships with a DAW-like "Stories Editor"
→ No cloud, runs locally on your machine

100% Open Source. 100% Local.

Link to repo in 🧵↓

It features a full-blown "Stories Editor" (DAW stylee!):

→ Drag & drop multi-track timeline 🎚️
→ Complex conversation mixing
→ Precise inline trimming

Perfect for creating podcasts or multi-speaker narratives locally!

Massive shoutout to @jamiepine for shipping this in open source!

→ voicebox.sh

Mac & Windows builds are already available.

Don't forget to give a ⭐ on GitHub to support Jamie!
→ github.com/jamiepine/voic…

Read 5 tweets

Charly Wargnier

@DataChaz

Mar 17

Someone built the ultimate visual LLM Architecture Gallery, packing 38 models from 2024-2026 into a single hub 🤯

It completely breaks down the complexity for you.

Inside:
→ Annotated diagrams
→ Key design choices
→ Actual code implementations

link to the gallery in 🧵↓

Here is the full roster!

- Llama 3 8B
- OLMo 2 7B
- DeepSeek V3
- DeepSeek R1
- Gemma 3 27B
- Mistral Small 3.1 24B
- Llama 4 Maverick
- Qwen3 235B-A22B
- Qwen3 32B
- Qwen3 8B
- Qwen3 4B
- SmolLM3 3B
- Kimi K2
- GLM-4.5 355B
- GPT-OSS 20B
- GPT-OSS 120B
- Grok 2.5 270B
- Qwen3 Next 80B-A3B
- MiniMax M2 230B
- Kimi Linear 48B-A3B
- OLMo 3 7B
- OLMo 3 32B
- DeepSeek V3.2
- Mistral 3 Large
- Nemotron 3 Nano 30B-A3B
- Xiaomi MiMo-V2-Flash 309B
- GLM-4.7 355B
- Arcee AI Trinity Large 400B
- GLM-5 744B
- Nemotron 3 Super 120B-A12B
- Step 3.5 Flash 196B
- Nanbeige 4.1 3B
- MiniMax M2.5 230B
- Tiny Aya 3.35B
- Ling 2.5 1T
- Qwen3.5 397B
- Sarvam 105B
- Sarvam 30B

Access the high-resolution gallery and the blog post here:

→ sebastianraschka.com/llm-architectu…
sebastianraschka.com/llm-architectu…

Read 4 tweets

Share this page!

Enter URL or ID to Unroll

Charly Wargnier

Try unrolling a thread yourself!

More from @DataChaz

Charly Wargnier

Charly Wargnier

Charly Wargnier

Charly Wargnier

Charly Wargnier

Charly Wargnier

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!