Charly Wargnier Profile picture
Feb 11, 2025 11 tweets 5 min read Read on X
OpenAI is getting Deepseek’d again.

@Convergence_ai_, a tiny London startup just built one of the most capable AI agents for the web.

Proxy is outperforming Operator on every benchmark handpicked by @OpenAI.



Let’s dive in! 🧵↓ proxy.convergence.aiImage
Let’s start with this, proof that funding isn’t everything.

@OpenAI raised $18 billion.

@Convergence_ai_? Just $12 million! That’s 0.067% of OpenAI’s budget.



And yet, Proxy is faster, smarter, fully autonomous.

Keep scrolling for comparison videos! ↓tech.eu/2024/09/27/con…
Task 1: Find chicken recipes.

❌ Operator gives up after 4 minutes.
✔️ Proxy completes it twice before Operator finishes.
✔️ Not only that, Proxy also delivers complete results.

3/
Task 2: Get the latest basketball news.

❌ Operator gets stuck on a CAPTCHA and needs human help.
✔️ Proxy bypasses it instantly with no manual input.
❌ Operator provides just a link with minimal info.
✔️ Proxy delivers full details in under a minute.

4/
Task 3: Find top-rated @Tripadvisor stays.

❌ Operator takes over 2 minutes and delivers less information.
✔️ Proxy finds 5 top-rated stays in 56 seconds with prices, ratings, and reviews.

5/
Task 4: Get the latest US economy news.

❌ Operator is slow and less informative.
✔️ Proxy provides a detailed, high-quality summary faster.

6/
What’s more, Proxy is ranked #1 globally via the WebVoyager benchmark which assesses agentic capabilities across 600 web based tasks.

7/ Image
.... and when it comes to cost, it’s not even a question.

→ Operator costs $200.
→ Proxy is Free, with a $20/month Pro option.

Check it out:

8/ convergence.ai/#:~:text=Our%2…Image
But wait… there’s more.

Proxy does stuff that Operator simply can’t.

→ Schedule automations to run on repeat - your own AI agent on autopilot. → Instantly share automations on X/Twitter so anyone can run them... with one click! 🤯

9/ Image
That’s a wrap!

Proxy beats Operator where it matters most: speed, autonomy, features, and cost.

Try it for free or go Pro for just $20/month.



10/ proxy.convergence.aiImage
If this was useful, a quick RT would go a long way in giving this London startup a voice and some more oomph against the industry giants! 💪

And if you’re into AI agents and LLMs, don’t forget to follow me @DataChaz for more insights! :)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Charly Wargnier

Charly Wargnier Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @DataChaz

Jun 24
BAIDU JUST DROPPED AN ABSOLUTE GAME-CHANGER FOR DOCUMENT AI

It’s called `Unlimited-OCR`, and it can literally transcribe an entire book in a single pass 🤯

Most vision models read a single page, forget the context, and eventually hit a wall where performance degrades and inference slows down.

@Baidu_Inc built this on top of `DeepSeek OCR` but fixed the memory bloat with a single change to the attention mechanism.

The design mimics how a human hand-copies a book.

Instead of trying to hold the entire book in active memory, each token only looks at the current page plus the last 128 words.

This creates a sliding window that keeps memory usage completely flat, no matter how long the output gets.

The architectural shift delivers three massive upgrades for document parsing:
→ A fixed memory footprint
→ Steady generation speed on massive documents
→ The ability to process dozens of pages per pass

The numbers back it up.

Unlimited-OCR scores 93% on standard parsing benchmarks, beating the older baseline by a full six points.

Even when pushed past 40 pages, the error rate stays under 0.11.

More importantly, it maintains a flat speed curve where older models suffered a 35% slowdown.

Free and open-source.

Repo, weights and paper in 🧵↓
Read 5 tweets
Jun 13
🚨 @Karpathy predicted the power of the "LLM Wiki." Google just formalized it.

Meet Open Knowledge Format (OKF): a vendor-neutral standard for giving foundation models the curated context they need.

I can genuinely see this replacing Notion, Obsidian, or traditional wikis for developer teams, and the reason comes down to bookkeeping.

Traditional wikis fail because humans inevitably abandon the tedious work of updating them.

As Andrej Karpathy pointed out recently, LLMs don't get bored.

They don't forget to update a cross-reference, and they can touch 15 files in a single pass.

OKF standardizes the interoperability layer so agents can actually do that heavy lifting autonomously.

Because the format is minimally opinionated, it doesn't dictate what you write, it just dictates how it's structured. You get:
→ Human-readable documents that live right alongside your code in version control
→ Cross-links that map out complex entity relationships without needing a graph database
→ A system that survives moving between different tools and organizations

There is no complex compression scheme.

No central registry.

If you can cat a file, you can read it.

If you can git clone a repo, you can deploy it.

This is how we stop rebuilding context pipelines from scratch every time a new model drops.

Announcement + spec file in 🧵↓Image
Read 4 tweets
May 17
🚨 New AI guides drop every single day, yet these 9 official guides from OpenAI, Google, and Anthropic are still the definitive foundation you need.

Bookmark these: 🧵 ↓ Image
1/ 601 GenAI Use Cases – by @Google

The enterprise AI playbook keeps growing!

There are over 600 use cases inside this gigantic guide from Google! 🔥

cloud.google.com/transform/101-…

cloud.google.com/transform/101-…
2/ Agents Companion – by @Kaggle

Here's a great playbook filled with tools and reference material for agent builders.

kaggle.com/whitepaper-age…Image
Read 11 tweets
Apr 3
🚨 Karpathy’s new set-up is the ultimate self-improving second brain, and it takes zero manual editing 🤯

It acts as a living AI knowledge base that actually heals itself.

Let me break it down.

Instead of relying on complex RAG, the LLM pulls raw research directly into an @Obsidian Markdown wiki. It completely takes over:

✦ Index creation
✦ System linting
✦ Native Q&A routing

The core process is beautifully simple:

→ You dump raw sources into a folder
→ The LLM auto-compiles an indexed .md wiki
→ You ask complex questions
→ It generates outputs (Marp slides, matplotlib plots) and files them back in

The big-picture implication of this is just wild.

When agents maintain their own memory layer, they don’t need massive, expensive context limits.

They really just need two things:

→ Clean file organization
→ The ability to query their own indexes

Forget stuffing everything into one giant prompt.

This approach is way cheaper, highly scalable... and 100% inspectable!Image
Wow. Insanely fast turnaround from @himanshustwts!

A full breakdown of @karpathy’s self-improving wiki framework,

walking through every stage from ingestion to what comes next 👀 Image
@himanshustwts @karpathy Omar took a v. similar approach with @Obsidian

You can check it out here:

Read 5 tweets
Mar 19
With Voicebox, @ElevenLabs just lost its moat.

→ Powered by Alibaba's Qwen3-TTS for near-perfect cloning
→ Ships with a DAW-like "Stories Editor"
→ No cloud, runs locally on your machine

100% Open Source. 100% Local.

Link to repo in 🧵↓
It features a full-blown "Stories Editor" (DAW stylee!):

→ Drag & drop multi-track timeline 🎚️
→ Complex conversation mixing
→ Precise inline trimming

Perfect for creating podcasts or multi-speaker narratives locally! Image
Massive shoutout to @jamiepine for shipping this in open source!

voicebox.sh

Mac & Windows builds are already available.

Don't forget to give a ⭐ on GitHub to support Jamie!
github.com/jamiepine/voic…
Read 5 tweets
Mar 17
Someone built the ultimate visual LLM Architecture Gallery, packing 38 models from 2024-2026 into a single hub 🤯

It completely breaks down the complexity for you.

Inside:
→ Annotated diagrams
→ Key design choices
→ Actual code implementations

link to the gallery in 🧵↓ Image
Here is the full roster!

- Llama 3 8B
- OLMo 2 7B
- DeepSeek V3
- DeepSeek R1
- Gemma 3 27B
- Mistral Small 3.1 24B
- Llama 4 Maverick
- Qwen3 235B-A22B
- Qwen3 32B
- Qwen3 8B
- Qwen3 4B
- SmolLM3 3B
- Kimi K2
- GLM-4.5 355B
- GPT-OSS 20B
- GPT-OSS 120B
- Grok 2.5 270B
- Qwen3 Next 80B-A3B
- MiniMax M2 230B
- Kimi Linear 48B-A3B
- OLMo 3 7B
- OLMo 3 32B
- DeepSeek V3.2
- Mistral 3 Large
- Nemotron 3 Nano 30B-A3B
- Xiaomi MiMo-V2-Flash 309B
- GLM-4.7 355B
- Arcee AI Trinity Large 400B
- GLM-5 744B
- Nemotron 3 Super 120B-A12B
- Step 3.5 Flash 196B
- Nanbeige 4.1 3B
- MiniMax M2.5 230B
- Tiny Aya 3.35B
- Ling 2.5 1T
- Qwen3.5 397B
- Sarvam 105B
- Sarvam 30B
Access the high-resolution gallery and the blog post here:

sebastianraschka.com/llm-architectu…
sebastianraschka.com/llm-architectu…
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(