Latest Twitter Threads by @TheTuringPost on Thread Reader App

Oct 17 • 9 tweets • 3 min read

.@nvidia introduced a new RL approach that’s both faster and lighter on compute.

QeRL's idea is to combine 2 things:

- Quantization (NVFP4)
- Low-Rank Adaptation (LoRA)

But a key innovation is Adaptive Quantization Noise (AQN): QeRL turns quantization noise into an exploration tool, adjusting it on the fly during RL.

Here are the details:

1. QeRL builds two RL algorithms for LLMs:

- GRPO: creates multiple answers for a prompt, scores them with rule-based rewards, and updates the model using average scores.
- Dynamic Sampling Policy Optimization (DAPO): removes limits on how much the model can vary during training so that it can discover more diverse solutions.

Upon this, QeRL adds quantization.

Oct 10 • 9 tweets • 3 min read

Tiny Recursive Model (TRM) is a simple, effective approach built on the idea: do more with less.

It uses just 1 small 2-layer network that recursively improves its own answers.

With only 7M parameters, TRM sets new records, beating LLMs 10,000× larger:

- Sudoku-Extreme: 55% → 87%
- Maze-Hard: 75% → 85%
- ARC-AGI-1: 40% → 45%
- ARC-AGI-2: 5% → 8%

Here is how it works:

1. TRM is built on the idea of the Hierarchical Reasoning Model (HRM).

HRM uses 2 small neural networks working together, each at its own rhythm, to successfully solve hard problems like Sudoku, mazes, and ARC-AGI puzzles, though it’s tiny (27 million parameters).

TRM is a simpler, smaller alternative to HRM.

Oct 3 • 9 tweets • 3 min read

Retrieval-of-Thought (RoT) makes reasoning models faster by reusing earlier reasoning steps as templates.

These steps are stored in a “thought graph” that shows both their order and meaning.

As a result, RoT:

- reduces output tokens by up to 40%
- speeds up inference by 82%
- lowers cost by 59%

All without losing accuracy.

Here is how it works:

RoT works by:

- Storing reasoning steps as nodes in a “thought graph.”
- Retrieving relevant steps when a new problem comes in.
- Assembling a dynamic template from those steps to guide the model.

Let’s take it step by step

Aug 27 • 10 tweets • 6 min read

7 Notable models of the week

Open-source
▪️ Intern-s1
▪️ Nemotron Nano 2 by @NVIDIA
▪️ DeepSeek V3.1
▪️ Ovis2.5 by @AlibabaGroup
▪️ Matrix-game 2.0 by @Skywork_ai

▪️ Command A Reasoning by @Cohere
▪️ Dinov3 from @AIatMeta

Details 🧵

Also check out the most important weekly AI news here ->
turingpost.com/p/fod115

1. Intern-s1: A scientific multimodal foundation model by Shanghai AI Lab (open-source)

This is a 241B-parameter multimodal Mixture-of-Experts model with 28B active parameters, optimized for scientific reasoning:

- Trained on 5T tokens (2.5T scientific)
- Supports text, images, molecular structures, and time-series data.
- Has a dynamic tokenizer and Mixture-of-Rewards RL framework
- Outperforms both open- and closed-source models on MatBench, ChemBench, etc.

arxiv.org/abs/2508.15763

Aug 12 • 12 tweets • 7 min read

The freshest AI/ML research of the week

Our top 9
▪️ Sotopia-RL: Reward Design for Social Intelligence
▪️ Agent Lightning: Train ANY AI Agents with RL
▪️ Exploitation Is All You Need... for Exploration
▪️ Learning to Reason for Factuality
▪️ VeOmni
▪️ Is Chain-of-Thought Reasoning of LLMs a Mirage?
▪️ Cognitive Loop via In-Situ Optimization
▪️ Sculptor
▪️ CoAct-1

▪️ Tool-integrated Reinforcement Learning for Repo Deep Search
▪️ RL-PLUS
▪️ SEAgent
▪️ CRINN
▪️ Training Long-Context, Multi-Turn Software Engineering Agents with RL
▪️ Beyond the Trade-off: Self-Supervised RL for Reasoning Models' Instruction Following
▪️ CompassVerifier
▪️ Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
▪️ Are Today's LLMs Ready to Explain Well-Being Concepts?
▪️ VeriGUI
▪️ Trainable Dynamic Mask Sparse Attention
▪️ LeanK
▪️ Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
▪️ On the Generalization of SFT
▪️ SitEmb-v1.5
▪️ AttnTrace
▪️ LaTCoder
▪️ ChartCap

🧵

1. Sotopia-RL: Reward Design for Social Intelligence

Trains socially intelligent agents with utterance-level, multi-dimensional rewards to capture nuanced social behaviors

arxiv.org/abs/2508.03905
Project page: rl.sotopia.world

Jul 11 • 5 tweets • 2 min read

SingLoRA is a new simple version of LoRA (Low Rank Adaptation) by Technion that uses only one small matrix instead of usual two.

It multiplies it by its own transpose (like A × Aᵀ).

What does it buy you?

- No scale mismatch between different matrices
- Uses ~half the parameters of LoRA
- Stability and better learning

Here's how it works:

1. Workflow of SingLoRA:

• The original weights of the model (W₀) are frozen.
• The system adds a small adapter - a learnable piece that updates the model for your specific task.
In SigLoRA, it's A × Aᵀ, where:
- A is a small trainable matrix with n × r size, where r ≪ n
- Aᵀ is its transpose
• The original model and the adapter are combined like this:

Jul 1 • 19 tweets • 12 min read

The freshest AI/ML research papers of the week

Our top 7:

▪️ OctoThinker
▪️ Performance Prediction for Large Systems via Text-to-Text Regression
▪️ Radial Attention
▪️ MADrive
▪️ Mind2Web 2
▪️ Chain-of-Experts
▪️ Ark

▪️ Where to find Grokking
▪️ Skywork-SWE
▪️ BlenderFusion
▪️ OmniGen2
▪️ LLaVA-Scissor
▪️ MMSearch-R1
▪️ LongWriter-Zero
▪️ Steering Conceptual Bias
▪️ WorldVLA

🧵

1. OctoThinker

Improves reinforcement learning alignment via mid-training strategies and math-intensive corpora

arxiv.org/abs/2506.20512
GitHub: github.com/GAIR-NLP/OctoT…

Jun 28 • 17 tweets • 3 min read

30 days, 15 AI Coding Agents, one prompt — and the results will surprise you!

Will Schenk, TheFocusAI, specially for Turing Post tested which coding tool could best build a Dockerized idea app with voting, notes, and file attachments.

You would not believe what he discovered about Cursor, v0, Copilot, and 12 others 🧵

1. Aider @aider_chat
This free, open-source CLI cranks out solid code faster than GitHub’s $20/month Copilot.

Grab the full June 2025 Coding Agent Report for code quality, testing, and more surprising and useful details to know what agent to hire -> github.com/The-Focus-AI/j…

Jun 27 • 6 tweets • 3 min read

Chain-of-Experts (CoE) - a new kind of model architecture.

It builds on Mixture-of-Experts (MoE) idea that a model can choose a different expert each round.

➡️ As a new addition, experts work in a sequence, one after the other
within a layer.

CoE keeps the number of active experts the same as before, but:

- Uses up to 42% less memory
- Unlocks over 800× more effective expert combinations
- Improves performance

Here's how it works:

1. In CoE:

- The model picks a small group of experts.
- Each expert transforms the current hidden state of a token.
- The outputs are combined using gating weights.
- A residual connection helps keep the information stable.

So, the final result is the token after it's been processed by C rounds of experts, with each round learning from the last.

Jun 26 • 7 tweets • 4 min read

Models, datasets and benchmarks to pay attention to:

▪️ Gemini 2.5 Flash and Pro, plus Gemini 2.5 Flash-Lite
▪️ MiniMax-M1
▪️ Kimi-Dev-72B

▪️ SHADE-Arena benchmark
▪️ ESSENTIAL-WEB V1.0 dataset

🧵

1. @Google introduced Gemini 2.5 Flash and Pro as stable and production-ready, and launched Gemini 2.5 Flash-Lite in preview – the fastest and most cost-efficient.

Flash-Lite outperforms 2.0 Flash-Lite in coding, math, science, reasoning, and multimodal benchmarks. It features lower latency, supports 1 million-token context, multimodal input, and connects to tools like Google Search and code execution

storage.googleapis.com/deepmind-media…

Jun 19 • 12 tweets • 8 min read

Models and datasets to pay attention to:

▪️ Institutional Books 1.0 - a 242B token dataset
▪️ o3-pro from @OpenAI
▪️ FGN from @GoogleDeepMind
▪️ Magistral by @MistralAI
▪️ Resa: Transparent Reasoning Models via SAEs
▪️ Multiverse (Carnegie+NVIDIA)
▪️ Ming-Omni
▪️ Seedance 1.0 by ByteDance
▪️ Sentinel

🧵

1. Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability

Sourced from 1,075,899 scanned books across 250+ languages via the Google Books project, the dataset includes both raw and post-processed text and detailed metadata.

arxiv.org/abs/2506.08300

Jun 18 • 8 tweets • 4 min read

The latest AI/ML news if the week:

▪️ @HuggingFace helps to find the best model based on size
▪️ NVIDIA’s Jensen Huang and @ylecun disagree with Anthropic’s Dario Amodei predictions
▪️ @AIatMeta’s Superintelligence Gambit
▪️ @Google adds a voice to Search
▪️ Mattel and @OpenAI: brains to Barbie
▪️ Projects in ChatGPT

Details 🧵

1. Hugging Face insists, “Bigger isn’t better”

https://twitter.com/186420551/status/1934672721066991908

Jun 10 • 19 tweets • 12 min read

The freshest research papers:

▪️ Self-Challenging Language Model Agents
▪️ Reflect, Retry, Reward
▪️ ProRL
▪️ Beyond the 80/20 Rule
▪️ REASONING GYM
▪️ AlphaOne
▪️ Unleashing the Reasoning Potential...Critique Fine-Tuning
▪️ ARIA
▪️ Incentivizing Reasoning...Instruction Following
▪️ OThink-R1

▪️ Reasoning Like an Economist
▪️ A Controllable Examination for Long-Context LLMs
▪️ SuperWriter

▪️ Protocol Models
▪️ AReaL
▪️ StreamBP
▪️ Taming LLMs by Scaling Learning Rates

▪️ Diagonal Batching
▪️ Inference-Time Hyper-Scaling with KV Cache Compression
▪️ Unified Scaling Laws for Compressed Representations

▪️ GUI-Actor
▪️ Surfer-H Meets Holo1

▪️ Qwen3 Embedding
▪️ Aligning Latent Spaces with Flow Priors
▪️ Large Language Models are Locally Linear Mappings

▪️ Establishing Trustworthy LLM Evaluation
▪️ Evaluation is All You Need
▪️ Datasheets Aren't Enough

🧵

1. Self-Challenging Language Model Agents by @AIatMeta, @UCBerkeley

Trains agents to create and solve their own tool-use tasks using code-based problem generation and RL

arxiv.org/abs/2506.01716

Jun 7 • 10 tweets • 3 min read

Log-linear attention — a new type of attention proposed by @MIT which is:

- fast and efficient as linear attention
- expressive as softmax

It uses a small but growing number of memory slots that increases logarithmically with the sequence length.

Here's how it works:

1. Input:

At each time step t, you have:

- Query vector (Q): what the model is asking
- Key vector (K): what the model remembers
- Value vector (V): what the model retrieves

They are computed from the input using learned linear projections.

Jun 6 • 16 tweets • 3 min read

.@JeffDean interview at @Sequoia’s AI Ascent is a must-watch. He provides a real look at where AI is headed, what’s actually happening in the field, sharing insights on:

• Specialized hardware
• Evolution of models
• Future of computing infrastructure
• AI's role in science and more

Here are the key takeaways:

1. Where is AI going these days?

Models are improving fast and solving more problems each year. Hardware, training algorithms, and RL techniques have brought us here — and multimodal is a big focus for what’s next.

May 29 • 6 tweets • 2 min read

Latent reasoning lets the model do more of its "thinking" internally.

This internal info has continuous format compared to the discrete output text.

To efficiently mix this info, researchers from @UofIllinois proposed HRPO (Hybrid Reasoning Policy Optimization) – an RL-based hybrid latent reasoning framework.

Here's how it works:

1. HRPO uses reinforcement learning (RL) to train LLMs to reason internally without needing CoT training data.

It integrates hidden states into token sampling using a learnable gating mechanism.

May 26 • 7 tweets • 3 min read

A new recipe for training multimodal models

👉 Mixed together various data types: text next to images, video frames after captions, then webpages, etc. This way the model learns to connect what it reads with what it sees.

ByteDance proposed and implemented this idea in their BAGEL, a new open-source multimodal model.

Here's how it works:

Architecture:

BAGEL is one giant Transformer with two separate experts inside:

- Understanding expert handles text and ViT image tokens.
- Generation expert handles the VAE image-creation tokens.

These experts are placed side-by-side in every layer and "look" at the same sequence, but each focuses on its own job.

May 24 • 14 tweets • 3 min read

.@sama's interview at @sequoia AI Ascent introduces a lot of insights on:

- How OpenAI came to ChatGPT
- Its aim to be the “core AI subscription”
- AI as an operating system
- What the ideal smart model is
- Main future goals

Here is an outline of his talk with the key ideas:

1. Past milestones and directions

- The first consumer product was Dolly API
- OpenAI also tried building a robot hand
- One person and then a team became excited about building LLMs with unsupervised learning, which started with GPT-1, GPT-2. Then GPT-3 showed something cool.

May 20 • 9 tweets • 2 min read

What is the Agentic Web?

8 important updates from #MSBuild

1. Agents as first-class business & M365 entities.

2. Microsoft Entra Agent ID for knowing your agents.

3. NLWeb, MCP, Open Protocols as the foundation layer for an open agent ecosystem.

4. Agentic DevOps revolutionizes software development with GitHub Copilot’s new coding agent.

5. Azure AI Foundry with 1,900+ models & Copilot Studio

6. Collaboration: Human-Agent & Agent-Agent with Teams as a “multiplayer” agent hub.

7. Windows AI Foundry, Foundry Local (for macOS) and open-sourced WSL, NLWeb, and Copilot in VS Code

8. Microsoft Discovery — AI for science

Read more about there updates in our free weekly newsletter: turingpost.com/p/fod101

1. Agents as first-class business & M365 entities:

The new Microsoft 365 Copilot unifies chat, search, notebooks, and tools like “Researcher” and “Analyst.” With Copilot Tuning, businesses can tailor agents to their own knowledge, language, and brand voice.

May 20 • 20 tweets • 13 min read

The freshest research of the week:

Our top 9:
▪️ Beyond 'Aha!'
▪️ J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
▪️ The CoT Encyclopedia
▪️ System Prompt Optimization with Meta-Learning
▪️ Parallel Scaling Law for LMs
▪️ Insights into DeepSeek-V3
▪️ QuXAI: Explainers for Hybrid Quantum Machine Learning Models
▪️ AttentionInfluence
▪️ MLE-Dojo

▪️ Learning from Peers in Reasoning Models
▪️ WorldPM
▪️ Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
▪️ Learning Dynamics in Continual Pre-Training for LLMs
▪️ Memorization-Compression Cycles Improve Generalization
▪️ DanceGRPO
▪️ Unified Continuous Generative Model
▪️ Depth Anything with Any Prior
▪️ MetaUAS

🧵

1. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Proposes aligning models with meta-reasoning abilities (deduction, induction, abduction) to improve reasoning reliability and performance

arxiv.org/abs/2505.10554
Code: github.com/zhiyuanhubj/Me…

May 20 • 9 tweets • 3 min read

Designing models and hardware together — is it a new shift for the best
cost-efficient models?

This idea is used in DeepSeek-V3 that is trained on just 2,048 powerful NVIDIA H800 GPUs.

A new research from @deepseek_ai clarifies how DeepSeek-V3 works using its key innovations:

- Multi-head Latent Attention (MLA)
- Mixture of Experts (MoE)
- FP8 mixed-precision training
- Multi-Plane Network Topology

🧵

1. Multi-head Latent Attention (MLA)

MLA compresses the KV cache down to 70 KB per token, while other models like LLaMA-3.1 and Qwen2.5 need 7x more.

Thanks to this DeepSeek-V3:
- Handles long conversations
- Runs on limited hardware
- Makes inference cheaper and more scalable

Share this page!

Enter URL or ID to Unroll