Ksenia_TuringPost Profile picture
Feb 25, 2025 26 tweets 15 min read Read on X
The freshest AI/ML research of the week:

Our top 9
▪️ SigLIP 2
▪️ Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos
▪️ Native Sparse Attention
▪️ OctoTools
▪️ ReLearn
▪️ On the Trustworthiness of Generative Foundation Models
▪️ S* Test Time Scaling for Code Generation
▪️ Autellix (Serving Engine for LLM Agents)
▪️ Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

▪️ SurveyX
▪️ From RAG to Memory: Non-Parametric Continual Learning for LLMs
▪️ How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
▪️ Train Small, Infer Large
▪️ Eager Updates for Overlapped Communication and Computation in DiLoCo
▪️ S^2R: Teaching LLMs to Self-verify and Self-correct via RL
▪️ Logic-RL
▪️ Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL
▪️ Armap
▪️ Thinking Preference Optimization
▪️ Rethinking Diverse Human Preference Learning through Principal Component Analysis
▪️ Craw4LLM
▪️ LLMs and Mathematical Reasoning Failures
▪️ Small Models Struggle to Learn from Strong Reasoners
▪️ Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options

🧵Image
Image
1. SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, @GoogleDeepMind

Advances vision-language learning with multilingual training and improved zero-shot capabilities

huggingface.co/papers/2502.14…
Checkpoints: github.com/google-researc… x.com/12714828789589…
2. Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos, @AIatMeta

Trains a model on video frame prediction to develop intuitive physics reasoning

huggingface.co/papers/2502.11…
Code and data: : github.com/facebookresear… Image
3. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention, @deepseek_ai

Optimizes sparse attention for long-context models, significantly improving efficiency

huggingface.co/papers/2502.11…Image
4. OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning, @Stanford

Develops a tool-based system for multi-step decision-making and structured tool use

huggingface.co/papers/2502.11…
Project page: octotools.github.io x.com/12714828789589…
5. ReLearn: Unlearning via Learning for Large Language Models

Introduces a knowledge-unlearning method that removes sensitive knowledge without degrading fluency

huggingface.co/papers/2502.11…
Code: github.com/zjunlp/unlearn. Image
6. On the Trustworthiness of Generative Foundation Models – Guideline, Assessment, and Perspective

Develops a framework for evaluating trustworthiness in generative AI models

huggingface.co/papers/2502.14…Image
7. S* Test Time Scaling for Code Generation, @UofCalifornia

Introduces a test-time scaling framework that improves LLM-based code generation through iterative debugging

huggingface.co/papers/2502.14…
Code: github.com/NovaSky-AI/Sky… Image
8. Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Enhances LLM serving efficiency for agentic applications by optimizing request scheduling

huggingface.co/papers/2502.13…Image
9. Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering, @JohnsHopkins

Examines how inference scaling helps LLMs selectively answer questions with confidence

huggingface.co/papers/2502.13…
10. SurveyX: Academic Survey Automation via Large Language Models

Develops an automated system for generating high-quality academic surveys, improving citation precision and evaluation frameworks

huggingface.co/papers/2502.14…Image
11. From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Introduces HippoRAG 2, a retrieval-augmented generation method that enhances long-term memory and retrieval

huggingface.co/papers/2502.14…
Code and data github.com/OSU-NLP-Group/… Image
12. How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Examines the trade-offs in integrating new knowledge into LLMs using Low-Rank Adaptation (LoRA)

huggingface.co/papers/2502.14…
13. Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Develops LORAM, a memory-efficient fine-tuning approach that enables large model training on low-resource hardware

huggingface.co/papers/2502.13…
Code: github.com/junzhang-zj/Lo… Image
14. Eager Updates for Overlapped Communication and Computation in DiLoCo, @GoogleDeepMind

Reduces communication bottlenecks in distributed LLM training by overlapping updates with computation

huggingface.co/papers/2502.12…Image
15. S^2R: Teaching LLMs to Self-verify and Self-correct via RL

Develops a framework to improve LLM reasoning by teaching self-verification and self-correction

huggingface.co/papers/2502.12…
Code and data: github.com/NineAbyss/S2R. Image
16. Logic-RL: Unleashing LLM Reasoning with Rule-Based RL, @MSFTResearch Asia

Uses RL to enhance logical reasoning capabilities

huggingface.co/papers/2502.14…Image
17. Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL

Optimizes quantum error-correcting codes using RL, reducing physical qubit overhead

huggingface.co/papers/2502.14…Image
18. Armap: Scaling Autonomous Agents via Automatic Reward Modeling and Planning

Introduces a decision-making framework that learns rewards automatically, improving agent-based reasoning

huggingface.co/papers/2502.12…Image
19. Thinking Preference Optimization

Enhances LLM reasoning by refining preference-based optimization of reasoning steps

huggingface.co/papers/2502.13…
Code: github.com/uservan/ThinkPO Image
20. Rethinking Diverse Human Preference Learning through Principal Component Analysis

Improves human preference modeling using principal component analysis (PCA) for better LLM alignment

huggingface.co/papers/2502.13…Image
21. Craw4LLM: Efficient Web Crawling for LLM Pretraining

Optimizes web crawling for LLM training by prioritizing the most impactful pages

huggingface.co/papers/2502.13…
Code: github.com/cxcscmu/Crawl4… Image
22. LLMs and Mathematical Reasoning Failures

Evaluates LLMs on newly designed math problems, exposing weaknesses in multi-step problem-solving

huggingface.co/papers/2502.11…Image
23. Small Models Struggle to Learn from Strong Reasoners

Identifies the limitations of small LLMs in benefiting from chain-of-thought distillation from larger models

huggingface.co/papers/2502.12…
Project page: small-model-gap.github.io Image
24. Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options

Enhances LLM problem-solving by systematically exploring multiple solution paths

huggingface.co/papers/2502.12…Image
Find other important AI and ML news in our free weekly newsletter: huggingface.co/blog/Kseniase/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ksenia_TuringPost

Ksenia_TuringPost Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheTuringPost

Jan 3
It’s become a tradition for DeepSeek to drop an outstanding paper around the New Year.

This time, it’s Manifold-Constrained Hyper-Connections (mHC), which fixes the instability in Hyper-Connections (HC).

mHC keeps residual connections stable and expressive via one simple rule:

➡️ Streams can share information, but without changing the overall signal strength.

Here’s how it works:Image
1. Residual connections have been around since ResNets and are still a big reason why Transformers and LLMs train as well as they do. They let information flow through the network unchanged, keeping deep models stable.
2. Hyper-Connections (HC) push this idea further by widening the residual path and mixing multiple streams. This extra freedom gives models more expressive power without increasing the compute per layer, but breaks stability and isn't very memory-friendly.
Read 11 tweets
Dec 7, 2025
This Google paper presented at #NeurIPS2025 is a true gem.

In their search for a better backbone for sequence models, they:

• Reframe Transformers & RNNs as associative memory systems driven by attentional bias
• Reinterpret "forgetting" as retention regularization, not as erasure
• Combine these insights into Miras – a unified framework for designing next-gen sequence architectures

From this perspective, they introduce 3 new models, Moneta, Yaad, and Memora, that:

- Beat Transformers, Mamba2, DeltaNet, and hybrids across key benchmarks
- Scale better to long contexts
- Deliver state-of-the-art recall on needle-in-a-haystack tests

Here are the details (really worth exploring):Image
Transformers traditionally dominate because they scale well, but they become slow and expensive for long sequences since attention grows quadratically.

Google's key idea draws from human attentional bias – our natural habit of focusing more on certain things than others.
1. Associative memory view

Google researchers show that Transformers, Titans, and RNNs can all be seen as associative memories that learn key→value mappings guided by an internal objective (the attentional bias).

This objective decides:
- what kind of memory the model builds
- what it should prioritize

Learning these mappings becomes a form of meta-learning.
Read 9 tweets
Nov 6, 2025
Supervised Fine-Tuning (SFT) + Reinforcement Learning with Verifiable Rewards (RLVR) = Supervised Reinforcement Learning (SRL)

Google Cloud AI Research introduced a new SRL training method that overcomes the issues of SFT and RLVR.

The main idea: it treats problem-solving as a sequence of logical actions.

Here is how it works:Image
What's the problem with common methods?

- Reinforcement Learning with Verifiable Rewards (RLVR) struggles when it can’t find correct examples to learn from.
- Supervised Fine-Tuning (SFT) tends to copy right answers too rigidly, token by token.

@googlecloud AI Research offer to fix both problems with SRL.
SRL trains the model to generate an internal reasoning monologue before deciding on each action. It also gives smoother feedback based on how closely each action matches expert examples from the SFT dataset.

Here's its step-by-step workflow:
Read 11 tweets
Oct 17, 2025
.@nvidia introduced a new RL approach that’s both faster and lighter on compute.

QeRL's idea is to combine 2 things:

- Quantization (NVFP4)
- Low-Rank Adaptation (LoRA)

But a key innovation is Adaptive Quantization Noise (AQN): QeRL turns quantization noise into an exploration tool, adjusting it on the fly during RL.

Here are the details:Image
1. QeRL builds two RL algorithms for LLMs:

- GRPO: creates multiple answers for a prompt, scores them with rule-based rewards, and updates the model using average scores.
- Dynamic Sampling Policy Optimization (DAPO): removes limits on how much the model can vary during training so that it can discover more diverse solutions.

Upon this, QeRL adds quantization.Image
2. QeRL uses:

- Quantization (NVFP4) – makes model computations smaller and faster.
- Low-Rank Adaptation (LoRA) – fine-tuning without touching every parameter.

This cuts memory use and speeds up RL training, while reaching the same quality as full fine-tuning.
Read 9 tweets
Oct 10, 2025
Tiny Recursive Model (TRM) is a simple, effective approach built on the idea: do more with less.

It uses just 1 small 2-layer network that recursively improves its own answers.

With only 7M parameters, TRM sets new records, beating LLMs 10,000× larger:

- Sudoku-Extreme: 55% → 87%
- Maze-Hard: 75% → 85%
- ARC-AGI-1: 40% → 45%
- ARC-AGI-2: 5% → 8%

Here is how it works:Image
1. TRM is built on the idea of the Hierarchical Reasoning Model (HRM).

HRM uses 2 small neural networks working together, each at its own rhythm, to successfully solve hard problems like Sudoku, mazes, and ARC-AGI puzzles, though it’s tiny (27 million parameters).

TRM is a simpler, smaller alternative to HRM.
2. No more complex math:

HRM depends on a mathematical “fixed-point” assumption to simplify gradients, assuming that its recursive loops converge to a stable state.

On the contrary, TRM just runs the full recursion several times and backpropagates through all steps.

This removes the need for theoretical constraints and gives a huge boost in generalization: 56.5% → 87.4% on Sudoku-Extreme.
Read 9 tweets
Oct 3, 2025
Retrieval-of-Thought (RoT) makes reasoning models faster by reusing earlier reasoning steps as templates.

These steps are stored in a “thought graph” that shows both their order and meaning.

As a result, RoT:

- reduces output tokens by up to 40%
- speeds up inference by 82%
- lowers cost by 59%

All without losing accuracy.

Here is how it works:Image
RoT works by:

- Storing reasoning steps as nodes in a “thought graph.”
- Retrieving relevant steps when a new problem comes in.
- Assembling a dynamic template from those steps to guide the model.

Let’s take it step by step
1. Building the "thought graph"

Researchers collected a large set of reasoning templates (3.34k). Each step in these templates became a node in the graph, with metadata like topic tags: algebra, geometry, etc.

- Sequential edges connect steps in the natural order within a template.
- Semantic edges connect steps that mean similar things across different templates.

So this graph acts like a memory bank of reasoning fragments.
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(