Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Hasan Toor

@hasantoxr

May 12 • 7 tweets • 3 min read • Read on X

Scrolly

I'm replacing every memory layer I've ever built into an agent with this.

SureThing dropped SOTA on LongMemEval.

88.0% overall. 91.0% knowledge update. 76.7% single-session preference.

Number one across every category that actually matters.

Then their own AI walked up to the screen and started explaining the whole thing itself.

Nobody asked it to.

Every memory system I've built before this worked the same way.

Store something. Retrieve it later. Hope the retrieval actually finds the right thing.

Two separate systems pmretending to be one.

@getsurething threw that model out completely.

The memory IS the computation. Fully fused. One architecture, not two bolted together.

That's the difference. That's why the numbers look the way they do.

The benchmark breakdown:

88.0% overall on LongMemEval
91.0% on knowledge update
76.7% on single-session preference

Top of every single category.

They didn't optimize for the benchmark.

The benchmark just revealed what the architecture was already doing.

Here's what this means in practice.

Give the agent a goal. Walk away. Come back to results.

No babysitting. No dying context windows. No starting from scratch every new session.

It remembers what you told it. It remembers what worked. It keeps getting better the longer you run it.

That's not how any other agent I've used behaves.

The demo moment nobody saw coming.

Their AI walked up to the big screen unprompted and started explaining the entire architecture to the room.

Clearer than most engineers could. More accurate than most blog posts I've read.

That's not a party trick.

That's what you get when memory and reasoning are actually the same system.

If you've spent time building memory layers into agents, setting up RAG pipelines, managing context windows, and watching it all fall apart after a few sessions, this is worth paying attention to.

SureThing is live now.

surething.io

https://twitter.com/1506055005992026115/status/2054204118194168014

As always, Thank you for reading this.

If you enjoyed this post:

1. Follow me @hasantoxr for more of these
2. RT the tweet below to share this thread with your audience

https://twitter.com/1506055005992026115/status/2054204118194168014

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @hasantoxr

Hasan Toor

@hasantoxr

May 9

A Chinese lab just dropped a 1 TRILLION parameter thinking model.

For free.

It's called Ring-2.6-1T from InclusionAI and it just made every $200/month "agent" subscription look like a scam.

Here's why this changes everything ↓

The numbers are absurd:

→ 1 Trillion total parameters
→ 63B active (MoE architecture)
→ 262,144 token context window
→ 65,536 max output tokens
→ $0 input. $0 output.

This isn't a stripped-down demo. This is the full model.

It's a "thinking" model built specifically for agent workflows.

Not chat. Not Q&A.

Real autonomous execution, coding agents, tool use, long-horizon tasks where the model has to stay coherent across hours of work.

The kind of thing OpenAI charges $200/mo for.

Read 10 tweets

Hasan Toor

@hasantoxr

May 5

This is genuinely impressive.

Gauth just dropped Atlas and it might be the end of textbooks.

Type any topic like "Silk Road," "how a camera works," "fall of Constantinople" and it builds you a hand-drawn, interactive visual world you can walk through.

No more reading walls of text. You explore knowledge like a map.

Here's how to use it (step by step): ↓

1. Go to

No signup wall. No paywall. Works straight in your browser.

This is the same Gauth that hit #1 in Education on the App Store built by ByteDance, used by millions of students.gauthmath.com/atlas

Type any subject into the search bar.

Anything works:

→ "The rise of the Roman Empire"
→ "Inside a beehive"
→ "How nuclear reactors work"
→ "The fall of Constantinople"

Too broad, too niche, too specific doesn't matter. If you're curious about it, Atlas builds it.

Read 10 tweets

Hasan Toor

@hasantoxr

May 5

GOOGLE QUIETLY BUILT THE SMARTEST LEARNING TOOL ON THE INTERNET

Google's NotebookLM has been free for months and it's better than any tutor I've ever paid for.

But 90% of people are using it completely wrong.

I'll give you 10 NotebookLM prompts to learn anything in record time.

1. The Feynman Decomposer

"Take every major concept in this material and rebuild each one as if you were Richard Feynman teaching a curious 12-year-old. Use only everyday analogies, real-world examples, and zero jargon. After each explanation, list the 3 most common misconceptions students have about this concept and explain exactly why those misconceptions feel intuitive but are wrong. Then test my understanding by asking me one question that forces me to apply the concept in a scenario not covered in the source material."

2. The Exam Predictor

"Act as the professor who wrote this material. Based on the structure, emphasis, repetition patterns, and depth of coverage across the source, predict the 10 most likely exam questions a professor would ask from this content. For each question, explain why it would be asked, which section of the source it pulls from, and what a perfect answer would look like. Then rank the questions from highest probability to lowest based on how heavily the source weights each topic."

Read 12 tweets

Hasan Toor

@hasantoxr

Apr 30

China just open-sourced a trillion-parameter model that burns fewer tokens than your favorite "efficient" US model.

Ling-2.6-1T is now public, inspectable, and benchmarkable.

The closed-model moat just got smaller.

Ant Group dropped this as a flagship, not a research toy.

1T parameters. Non-reasoning architecture. Fast-thinking by design.

It's not built to impress you with long chains of thought.

It's built to finish the task in fewer tokens than the models you're currently paying for.

The core obsession here is useful intelligence per token.

Most frontier models burn tokens narrating their thinking before they do anything.

Ling-2.6-1T skips the theater and goes straight to execution, which is the part that actually moves work forward in production.

Read 7 tweets

Hasan Toor

@hasantoxr

Apr 28

DeepSeek V4 just went live on ZenMux with free versions at launch.

Same coding power as Claude Opus 4.7.
7x cheaper on Pro. 90x cheaper on Flash.
1M native context. MIT licensed.

Here's how to swap it into Claude Code in 3 minutes 👇

First, the numbers everyone's freaking out about.

Claude Opus 4.7 output: $25/M
DeepSeek V4-Pro output: $3.48/M
DeepSeek V4-Flash output: $0.28/M

SWE-bench Verified:
→ Opus 4.7: 80.8%
→ V4-Pro: 80.6%

Tied on coding. Tiny fraction of the bill.

The architecture that makes it possible.

1.6T total parameters. Only 49B activated per token.
Compute requirements: 27% of the previous generation.
KV cache: slashed to 10%.

This is efficient MoE shipped at frontier scale.

Read 14 tweets

Hasan Toor

@hasantoxr

Apr 27

Wow... A YC-backed startup just turned game development into a single text box.

It's called CodeWisp. Type what you want and it gives you a playable game right in your browser.

No Unity. No Godot. No 5 years of tutorials. Just describe and play.

100% browser-based.

CodeWisp is a browser-based AI game builder backed by Y Combinator.

You describe the game you want in plain English.

It generates the complete code, structure, and assets automatically.

2D games. 3D games. Multiplayer browser games. All from a single prompt.

Here's how the workflow actually runs:

→ Open the browser editor (no download, no install)
→ Describe your game: mechanics, enemies, physics, levels, visuals
→ CodeWisp generates it instantly
→ Prompt edits to refine anything
→ Publish with a shareable link in one click

That's it. That's the whole process.