Post

@TensorFlow

More from @TheTuringPost

TuringPost

@TheTuringPost

Oct 10

Tiny Recursive Model (TRM) is a simple, effective approach built on the idea: do more with less.

It uses just 1 small 2-layer network that recursively improves its own answers.

With only 7M parameters, TRM sets new records, beating LLMs 10,000× larger:

- Sudoku-Extreme: 55% → 87%
- Maze-Hard: 75% → 85%
- ARC-AGI-1: 40% → 45%
- ARC-AGI-2: 5% → 8%

Here is how it works:

1. TRM is built on the idea of the Hierarchical Reasoning Model (HRM).

HRM uses 2 small neural networks working together, each at its own rhythm, to successfully solve hard problems like Sudoku, mazes, and ARC-AGI puzzles, though it’s tiny (27 million parameters).

TRM is a simpler, smaller alternative to HRM.

2. No more complex math:

HRM depends on a mathematical “fixed-point” assumption to simplify gradients, assuming that its recursive loops converge to a stable state.

On the contrary, TRM just runs the full recursion several times and backpropagates through all steps.

This removes the need for theoretical constraints and gives a huge boost in generalization: 56.5% → 87.4% on Sudoku-Extreme.

Read 9 tweets

TuringPost

@TheTuringPost

Oct 3

Retrieval-of-Thought (RoT) makes reasoning models faster by reusing earlier reasoning steps as templates.

These steps are stored in a “thought graph” that shows both their order and meaning.

As a result, RoT:

- reduces output tokens by up to 40%
- speeds up inference by 82%
- lowers cost by 59%

All without losing accuracy.

Here is how it works:

RoT works by:

- Storing reasoning steps as nodes in a “thought graph.”
- Retrieving relevant steps when a new problem comes in.
- Assembling a dynamic template from those steps to guide the model.

Let’s take it step by step

1. Building the "thought graph"

Researchers collected a large set of reasoning templates (3.34k). Each step in these templates became a node in the graph, with metadata like topic tags: algebra, geometry, etc.

- Sequential edges connect steps in the natural order within a template.
- Semantic edges connect steps that mean similar things across different templates.

So this graph acts like a memory bank of reasoning fragments.

Read 9 tweets

TuringPost

@TheTuringPost

Aug 27

7 Notable models of the week

Open-source
▪️ Intern-s1
▪️ Nemotron Nano 2 by @NVIDIA
▪️ DeepSeek V3.1
▪️ Ovis2.5 by @AlibabaGroup
▪️ Matrix-game 2.0 by @Skywork_ai

▪️ Command A Reasoning by @Cohere
▪️ Dinov3 from @AIatMeta

Details 🧵

Also check out the most important weekly AI news here ->
turingpost.com/p/fod115

1. Intern-s1: A scientific multimodal foundation model by Shanghai AI Lab (open-source)

This is a 241B-parameter multimodal Mixture-of-Experts model with 28B active parameters, optimized for scientific reasoning:

- Trained on 5T tokens (2.5T scientific)
- Supports text, images, molecular structures, and time-series data.
- Has a dynamic tokenizer and Mixture-of-Rewards RL framework
- Outperforms both open- and closed-source models on MatBench, ChemBench, etc.

arxiv.org/abs/2508.15763

2. Nemotron-Nano-9B-v2 by @NVIDIA (open-source)

It's a 9B hybrid Mamba-Transformer LLM optimized for reasoning:

- 3–6× higher throughput than Qwen3-8B
- Matches or exceeds its accuracy across benchmarks like MATH (80.5), BFCLv3, RULER-128k, AIME24
- FP8 pretraining on 20T tokens with 128k context
- Runs on a single 22GB A10G GPU

arxiv.org/abs/2508.14444

Read 10 tweets

TuringPost

@TheTuringPost

Aug 12

The freshest AI/ML research of the week

Our top 9
▪️ Sotopia-RL: Reward Design for Social Intelligence
▪️ Agent Lightning: Train ANY AI Agents with RL
▪️ Exploitation Is All You Need... for Exploration
▪️ Learning to Reason for Factuality
▪️ VeOmni
▪️ Is Chain-of-Thought Reasoning of LLMs a Mirage?
▪️ Cognitive Loop via In-Situ Optimization
▪️ Sculptor
▪️ CoAct-1

▪️ Tool-integrated Reinforcement Learning for Repo Deep Search
▪️ RL-PLUS
▪️ SEAgent
▪️ CRINN
▪️ Training Long-Context, Multi-Turn Software Engineering Agents with RL
▪️ Beyond the Trade-off: Self-Supervised RL for Reasoning Models' Instruction Following
▪️ CompassVerifier
▪️ Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
▪️ Are Today's LLMs Ready to Explain Well-Being Concepts?
▪️ VeriGUI
▪️ Trainable Dynamic Mask Sparse Attention
▪️ LeanK
▪️ Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
▪️ On the Generalization of SFT
▪️ SitEmb-v1.5
▪️ AttnTrace
▪️ LaTCoder
▪️ ChartCap

🧵

1. Sotopia-RL: Reward Design for Social Intelligence

Trains socially intelligent agents with utterance-level, multi-dimensional rewards to capture nuanced social behaviors

arxiv.org/abs/2508.03905
Project page: rl.sotopia.world

2. Agent Lightning: Train ANY AI Agents with RL by @MSFTResearch

Provides a general framework for applying RL to any AI agent architecture with minimal integration overhead

arxiv.org/abs/2508.03680
GitHub: github.com/microsoft/agen…

Read 12 tweets

TuringPost

@TheTuringPost

Jul 11

SingLoRA is a new simple version of LoRA (Low Rank Adaptation) by Technion that uses only one small matrix instead of usual two.

It multiplies it by its own transpose (like A × Aᵀ).

What does it buy you?

- No scale mismatch between different matrices
- Uses ~half the parameters of LoRA
- Stability and better learning

Here's how it works:

1. Workflow of SingLoRA:

• The original weights of the model (W₀) are frozen.
• The system adds a small adapter - a learnable piece that updates the model for your specific task.
In SigLoRA, it's A × Aᵀ, where:
- A is a small trainable matrix with n × r size, where r ≪ n
- Aᵀ is its transpose
• The original model and the adapter are combined like this:

2. SingLoRA is extended for all layer shapes, whether they are:

- Square (same input/output size), like many attention layers
- Rectangular (input ≠ output size), like MLP layers
- Non-square (here “truncated” version of A is used so the shapes line up correctly).

The A × Aᵀ adapter still forms the update.

Read 5 tweets

TuringPost

@TheTuringPost

Jul 1

The freshest AI/ML research papers of the week

Our top 7:

▪️ OctoThinker
▪️ Performance Prediction for Large Systems via Text-to-Text Regression
▪️ Radial Attention
▪️ MADrive
▪️ Mind2Web 2
▪️ Chain-of-Experts
▪️ Ark

▪️ Where to find Grokking
▪️ Skywork-SWE
▪️ BlenderFusion
▪️ OmniGen2
▪️ LLaVA-Scissor
▪️ MMSearch-R1
▪️ LongWriter-Zero
▪️ Steering Conceptual Bias
▪️ WorldVLA

🧵

1. OctoThinker

Improves reinforcement learning alignment via mid-training strategies and math-intensive corpora

arxiv.org/abs/2506.20512
GitHub: github.com/GAIR-NLP/OctoT…

2. Performance Prediction for Large Systems via Text-to-Text Regression, by @Google

Models system behavior from logs and configs using text-to-text LLMs that outperform tabular regressors with few-shot adaptation

arxiv.org/abs/2506.21718
Code: github.com/google-deepmin…

Read 19 tweets

Share this page!

Enter URL or ID to Unroll

TuringPost

Try unrolling a thread yourself!

More from @TheTuringPost

TuringPost

TuringPost

TuringPost

TuringPost

TuringPost

TuringPost

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!