Hasan Toor Profile picture
May 12 7 tweets 3 min read Read on X
I'm replacing every memory layer I've ever built into an agent with this.

SureThing dropped SOTA on LongMemEval.

88.0% overall. 91.0% knowledge update. 76.7% single-session preference.

Number one across every category that actually matters.

Then their own AI walked up to the screen and started explaining the whole thing itself.

Nobody asked it to.
Every memory system I've built before this worked the same way.

Store something. Retrieve it later. Hope the retrieval actually finds the right thing.

Two separate systems pmretending to be one.

@getsurething threw that model out completely.

The memory IS the computation. Fully fused. One architecture, not two bolted together.

That's the difference. That's why the numbers look the way they do.Image
The benchmark breakdown:

88.0% overall on LongMemEval
91.0% on knowledge update
76.7% on single-session preference

Top of every single category.

They didn't optimize for the benchmark.

The benchmark just revealed what the architecture was already doing.
Here's what this means in practice.

Give the agent a goal. Walk away. Come back to results.

No babysitting. No dying context windows. No starting from scratch every new session.

It remembers what you told it. It remembers what worked. It keeps getting better the longer you run it.

That's not how any other agent I've used behaves.
The demo moment nobody saw coming.

Their AI walked up to the big screen unprompted and started explaining the entire architecture to the room.

Clearer than most engineers could. More accurate than most blog posts I've read.

That's not a party trick.

That's what you get when memory and reasoning are actually the same system.
If you've spent time building memory layers into agents, setting up RAG pipelines, managing context windows, and watching it all fall apart after a few sessions, this is worth paying attention to.

SureThing is live now.

surething.io
As always, Thank you for reading this.

If you enjoyed this post:

1. Follow me @hasantoxr for more of these
2. RT the tweet below to share this thread with your audience

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Hasan Toor

Hasan Toor Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @hasantoxr

May 9
A Chinese lab just dropped a 1 TRILLION parameter thinking model.

For free.

It's called Ring-2.6-1T from InclusionAI and it just made every $200/month "agent" subscription look like a scam.

Here's why this changes everything ↓ Image
Image
The numbers are absurd:

→ 1 Trillion total parameters
→ 63B active (MoE architecture)
→ 262,144 token context window
→ 65,536 max output tokens
→ $0 input. $0 output.

This isn't a stripped-down demo. This is the full model. Image
It's a "thinking" model built specifically for agent workflows.

Not chat. Not Q&A.

Real autonomous execution, coding agents, tool use, long-horizon tasks where the model has to stay coherent across hours of work.

The kind of thing OpenAI charges $200/mo for.
Read 10 tweets
May 5
This is genuinely impressive.

Gauth just dropped Atlas and it might be the end of textbooks.

Type any topic like "Silk Road," "how a camera works," "fall of Constantinople" and it builds you a hand-drawn, interactive visual world you can walk through.

No more reading walls of text. You explore knowledge like a map.

Here's how to use it (step by step): ↓
1. Go to

No signup wall. No paywall. Works straight in your browser.

This is the same Gauth that hit #1 in Education on the App Store built by ByteDance, used by millions of students.gauthmath.com/atlas
Type any subject into the search bar.

Anything works:

→ "The rise of the Roman Empire"
→ "Inside a beehive"
→ "How nuclear reactors work"
→ "The fall of Constantinople"

Too broad, too niche, too specific doesn't matter. If you're curious about it, Atlas builds it.
Read 10 tweets
May 5
GOOGLE QUIETLY BUILT THE SMARTEST LEARNING TOOL ON THE INTERNET

Google's NotebookLM has been free for months and it's better than any tutor I've ever paid for.

But 90% of people are using it completely wrong.

I'll give you 10 NotebookLM prompts to learn anything in record time.Image
1. The Feynman Decomposer

"Take every major concept in this material and rebuild each one as if you were Richard Feynman teaching a curious 12-year-old. Use only everyday analogies, real-world examples, and zero jargon. After each explanation, list the 3 most common misconceptions students have about this concept and explain exactly why those misconceptions feel intuitive but are wrong. Then test my understanding by asking me one question that forces me to apply the concept in a scenario not covered in the source material."Image
2. The Exam Predictor

"Act as the professor who wrote this material. Based on the structure, emphasis, repetition patterns, and depth of coverage across the source, predict the 10 most likely exam questions a professor would ask from this content. For each question, explain why it would be asked, which section of the source it pulls from, and what a perfect answer would look like. Then rank the questions from highest probability to lowest based on how heavily the source weights each topic."Image
Read 12 tweets
Apr 30
China just open-sourced a trillion-parameter model that burns fewer tokens than your favorite "efficient" US model.

Ling-2.6-1T is now public, inspectable, and benchmarkable.

The closed-model moat just got smaller.
Ant Group dropped this as a flagship, not a research toy.

1T parameters. Non-reasoning architecture. Fast-thinking by design.

It's not built to impress you with long chains of thought.

It's built to finish the task in fewer tokens than the models you're currently paying for.
The core obsession here is useful intelligence per token.

Most frontier models burn tokens narrating their thinking before they do anything.

Ling-2.6-1T skips the theater and goes straight to execution, which is the part that actually moves work forward in production.
Read 7 tweets
Apr 28
DeepSeek V4 just went live on ZenMux with free versions at launch.

Same coding power as Claude Opus 4.7.
7x cheaper on Pro. 90x cheaper on Flash.
1M native context. MIT licensed.

Here's how to swap it into Claude Code in 3 minutes 👇
First, the numbers everyone's freaking out about.

Claude Opus 4.7 output: $25/M
DeepSeek V4-Pro output: $3.48/M
DeepSeek V4-Flash output: $0.28/M

SWE-bench Verified:
→ Opus 4.7: 80.8%
→ V4-Pro: 80.6%

Tied on coding. Tiny fraction of the bill. Image
The architecture that makes it possible.

1.6T total parameters. Only 49B activated per token.
Compute requirements: 27% of the previous generation.
KV cache: slashed to 10%.

This is efficient MoE shipped at frontier scale.
Read 14 tweets
Apr 27
Wow... A YC-backed startup just turned game development into a single text box.

It's called CodeWisp. Type what you want and it gives you a playable game right in your browser.

No Unity. No Godot. No 5 years of tutorials. Just describe and play.

100% browser-based.
CodeWisp is a browser-based AI game builder backed by Y Combinator.

You describe the game you want in plain English.

It generates the complete code, structure, and assets automatically.

2D games. 3D games. Multiplayer browser games. All from a single prompt.
Here's how the workflow actually runs:

→ Open the browser editor (no download, no install)
→ Describe your game: mechanics, enemies, physics, levels, visuals
→ CodeWisp generates it instantly
→ Prompt edits to refine anything
→ Publish with a shareable link in one click

That's it. That's the whole process.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(