TuringPost Profile picture
Feb 25 26 tweets 15 min read Read on X
The freshest AI/ML research of the week:

Our top 9
▪️ SigLIP 2
▪️ Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos
▪️ Native Sparse Attention
▪️ OctoTools
▪️ ReLearn
▪️ On the Trustworthiness of Generative Foundation Models
▪️ S* Test Time Scaling for Code Generation
▪️ Autellix (Serving Engine for LLM Agents)
▪️ Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

▪️ SurveyX
▪️ From RAG to Memory: Non-Parametric Continual Learning for LLMs
▪️ How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
▪️ Train Small, Infer Large
▪️ Eager Updates for Overlapped Communication and Computation in DiLoCo
▪️ S^2R: Teaching LLMs to Self-verify and Self-correct via RL
▪️ Logic-RL
▪️ Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL
▪️ Armap
▪️ Thinking Preference Optimization
▪️ Rethinking Diverse Human Preference Learning through Principal Component Analysis
▪️ Craw4LLM
▪️ LLMs and Mathematical Reasoning Failures
▪️ Small Models Struggle to Learn from Strong Reasoners
▪️ Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options

🧵Image
Image
1. SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, @GoogleDeepMind

Advances vision-language learning with multilingual training and improved zero-shot capabilities

huggingface.co/papers/2502.14…
Checkpoints: github.com/google-researc… x.com/12714828789589…
2. Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos, @AIatMeta

Trains a model on video frame prediction to develop intuitive physics reasoning

huggingface.co/papers/2502.11…
Code and data: : github.com/facebookresear… Image
3. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention, @deepseek_ai

Optimizes sparse attention for long-context models, significantly improving efficiency

huggingface.co/papers/2502.11…Image
4. OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning, @Stanford

Develops a tool-based system for multi-step decision-making and structured tool use

huggingface.co/papers/2502.11…
Project page: octotools.github.io x.com/12714828789589…
5. ReLearn: Unlearning via Learning for Large Language Models

Introduces a knowledge-unlearning method that removes sensitive knowledge without degrading fluency

huggingface.co/papers/2502.11…
Code: github.com/zjunlp/unlearn. Image
6. On the Trustworthiness of Generative Foundation Models – Guideline, Assessment, and Perspective

Develops a framework for evaluating trustworthiness in generative AI models

huggingface.co/papers/2502.14…Image
7. S* Test Time Scaling for Code Generation, @UofCalifornia

Introduces a test-time scaling framework that improves LLM-based code generation through iterative debugging

huggingface.co/papers/2502.14…
Code: github.com/NovaSky-AI/Sky… Image
8. Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Enhances LLM serving efficiency for agentic applications by optimizing request scheduling

huggingface.co/papers/2502.13…Image
9. Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering, @JohnsHopkins

Examines how inference scaling helps LLMs selectively answer questions with confidence

huggingface.co/papers/2502.13…
10. SurveyX: Academic Survey Automation via Large Language Models

Develops an automated system for generating high-quality academic surveys, improving citation precision and evaluation frameworks

huggingface.co/papers/2502.14…Image
11. From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Introduces HippoRAG 2, a retrieval-augmented generation method that enhances long-term memory and retrieval

huggingface.co/papers/2502.14…
Code and data github.com/OSU-NLP-Group/… Image
12. How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Examines the trade-offs in integrating new knowledge into LLMs using Low-Rank Adaptation (LoRA)

huggingface.co/papers/2502.14…
13. Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Develops LORAM, a memory-efficient fine-tuning approach that enables large model training on low-resource hardware

huggingface.co/papers/2502.13…
Code: github.com/junzhang-zj/Lo… Image
14. Eager Updates for Overlapped Communication and Computation in DiLoCo, @GoogleDeepMind

Reduces communication bottlenecks in distributed LLM training by overlapping updates with computation

huggingface.co/papers/2502.12…Image
15. S^2R: Teaching LLMs to Self-verify and Self-correct via RL

Develops a framework to improve LLM reasoning by teaching self-verification and self-correction

huggingface.co/papers/2502.12…
Code and data: github.com/NineAbyss/S2R. Image
16. Logic-RL: Unleashing LLM Reasoning with Rule-Based RL, @MSFTResearch Asia

Uses RL to enhance logical reasoning capabilities

huggingface.co/papers/2502.14…Image
17. Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL

Optimizes quantum error-correcting codes using RL, reducing physical qubit overhead

huggingface.co/papers/2502.14…Image
18. Armap: Scaling Autonomous Agents via Automatic Reward Modeling and Planning

Introduces a decision-making framework that learns rewards automatically, improving agent-based reasoning

huggingface.co/papers/2502.12…Image
19. Thinking Preference Optimization

Enhances LLM reasoning by refining preference-based optimization of reasoning steps

huggingface.co/papers/2502.13…
Code: github.com/uservan/ThinkPO Image
20. Rethinking Diverse Human Preference Learning through Principal Component Analysis

Improves human preference modeling using principal component analysis (PCA) for better LLM alignment

huggingface.co/papers/2502.13…Image
21. Craw4LLM: Efficient Web Crawling for LLM Pretraining

Optimizes web crawling for LLM training by prioritizing the most impactful pages

huggingface.co/papers/2502.13…
Code: github.com/cxcscmu/Crawl4… Image
22. LLMs and Mathematical Reasoning Failures

Evaluates LLMs on newly designed math problems, exposing weaknesses in multi-step problem-solving

huggingface.co/papers/2502.11…Image
23. Small Models Struggle to Learn from Strong Reasoners

Identifies the limitations of small LLMs in benefiting from chain-of-thought distillation from larger models

huggingface.co/papers/2502.12…
Project page: small-model-gap.github.io Image
24. Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options

Enhances LLM problem-solving by systematically exploring multiple solution paths

huggingface.co/papers/2502.12…Image
Find other important AI and ML news in our free weekly newsletter: huggingface.co/blog/Kseniase/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with TuringPost

TuringPost Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheTuringPost

Feb 18
3 models to pay attention to:

▪️ LM2: Large Memory Models

- Uses a Transformer architecture with a memory module to improve long-context reasoning.
- Outperforms RMT by 37.1% and excels in multi-hop inference.

▪️ NatureLM:

- Is trained across scientific domains.
- Enhancing tasks like SMILES-to-IUPAC translation and CRISPR RNA design for cross-domain applications.

▪️ Goedel-Prover:

- Advances formal proof generation
- Achieves 57.6% Pass@32 on miniF2F using expert iteration and statement formalizers.

Find the links below👇Image
Image
Image
1. LM2: Large Memory Models by Convergence Labs Ltd.

huggingface.co/papers/2502.06…
2. NatureLM: Deciphering the Language of Nature for Scientific Discovery from @MSFTResearch

huggingface.co/papers/2502.07…
Read 5 tweets
Feb 18
The freshest AI/ML research of the week:

Our top 7
▪️ Matryoshka Quantization
▪️ LLM Pretraining with Continuous Concepts
▪️ LLMs can easily learn to reason from demonstrations
▪️ Forget what you know about LLMs evaluations – LLMs are like a chameleon
▪️ Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
▪️ Hephaestus
▪️ SynthDetoxM Dataset

▪️ The Curse of Depth in LLMs
▪️ InfiniteHiP
▪️ Distillation Scaling Laws
▪️ TransMLA: Multi-Head Latent Attention
▪️ Logical reasoning in LLMs: A survey
▪️ ReasonFlux
▪️ How Stanford’s s1 surpasses DeepSeek-R1
▪️ The Stochastic Parrot on LLM’s Shoulder
▪️ Training LMs for Social Deduction with Multi-Agent RL
▪️ Towards Internet-scale training for agents
▪️ WorldGUI
▪️ CoSER: Coordinating LLM-Based Persona Simulation
▪️ Scaling Pre-training to One Hundred Billion Data for VLMs
▪️ Adapting Language-Specific LLMs to Reasoning Models

🧵Image
Image
Image
1. Matryoshka Quantization from @GoogleDeepMind

Introduces MatQuant, a multi-scale quantization method that mixes int2, int4, and int8 layers for efficient model deployment

huggingface.co/papers/2502.06… x.com/12714828789589…
2. LLM Pretraining with Continuous Concepts from @AIatMeta

Presents CoCoMix, which mixes token embeddings with abstract concept representations to improve training efficiency.

huggingface.co/papers/2502.08…
Code: github.com/facebookresear… Image
Read 24 tweets
Feb 16
Free useful guides on model distillations:

1. Model Distillation guide from @OpenAI
2. Knowledge Distillation tutorial by @PyTorch
3. Jetson Introduction to Knowledge Distillation by @nvidia
4. Tutorial on Knowledge Distillation with @kerasteam
5. @huggingface's guides:
- Knowledge Distillation
- Knowledge Distillation for Computer Vision

Save the link and check out the links below 👇Image
1. Model Distillation guide from @OpenAI

Explains this process step-by step, including
- storing outputs from a large model
- evaluating both large and small models
- create training data for a small model
- assess the fine-tuned small model

platform.openai.com/docs/guides/di…
2. Knowledge Distillation tutorial by @PyTorch covers:

• Extracting hidden representations for further calculations
• Modifying PyTorch training loops to include additional losses
• Enhancing lightweight models using complex models as teachers

pytorch.org/tutorials/begi…
Read 8 tweets
Feb 15
Distillation involves using a large teacher model to train a smaller student one.

But can we predict a distilled model’s performance based on teacher quality, student size, data volume, etc.?

@Apple and @UniofOxford explored this and developed distillation scaling laws.

Here are the key takeaways👇Image
1. A good teacher doesn’t always mean a better student:

If a teacher is too strong, the student might struggle to learn from it, leading to worse performance.
This is called the capacity gap — when the student isn’t powerful enough to properly mimic the teacher.
2. Distillation scaling law predicts how well a student model will perform based on three key factors:

- Student model's size
- The number of training tokens
- The teacher’s size and quality

This law follows a "power law" relationship, which means that performance improves in a predictable way but only to a point. Then adding more resources doesn’t help.
Read 10 tweets
Feb 10
The freshest AI/ML research of the week:

Our top 4
▪️ AlphaGeometry2
▪️ ZebraLogic
▪️ Limo: Less is More for Reasoning
▪️ Great Models Think Alike and this Undermines AI Oversight

▪️ Activation-Informed Merging of LLMs
▪️ Content-Format Integrated Prompt Optimization (CFPO)
▪️ BOLT: Bootstrapping Long Chain-of-Thought
▪️ Token Assorted: Mixing Latent & Text Tokens
▪️ ScoreFlow
▪️ The Jumping Reasoning Curve?
▪️ Demystifying Long Chain-of-Thought Reasoning in LLMs
▪️ MAGA
▪️ ParetoQ: Scaling Laws in Extremely Low-Bit LLM Quantization
▪️ Analyze Feature Flow to Enhance Interpretation and Steering in LMs
▪️ PILAF
▪️ DuoGuard
▪️ Limitations of LLMs in Clinical Problem-Solving
▪️ AI and Legal Analysis
▪️ HackerRank-ASTRA
▪️ The Open-Source Advantage in LLMs
▪️ UltraIF: Advancing Instruction-Following

🧵Image
Image
Image
1. AlphaGeometry2 (Olympiad Geometry Solver) from @GoogleDeepMind

Enhances AlphaGeometry to solve IMO-level geometry problems with a broader formal language

huggingface.co/papers/2502.03… x.com/12714828789589…
2. ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Evaluates LLMs on logic grid puzzles, revealing how complexity diminishes accuracy despite enhanced inference strategies

huggingface.co/papers/2502.01…
Benchmark: huggingface.co/spaces/WildEva… Image
Read 23 tweets
Feb 10
Sliding Tile Attention (STA) speeds up video generation up to 3.53x times.

It focuses only on small, relevant regions at a time and moves across the video in a sliding pattern.

STA processes larger chunks (tiles) at once, making it faster and more hardware-efficient.

Here's how it works:Image
Image
Firstly, what's wrong with current methods?

3D attention, that is generally used in Diffusion Transformers (DiTs), processes all video frames at once, treating every pixel separately, which takes up a huge amount of computing power—about 70% of the total effort.

The problem with traditional Sliding Window Attention (SWA) is that it creates "mixed blocks," which are inefficient for GPUs.

That's why researchers proposed Sliding Tile Attention (STA) method.
STA method:

STA organizes the video into structured "tiles" (small 3D blocks), ensuring that each tile interacts with only a few nearby tiles.

It works like a smart sliding window that moves across the video, focusing on the most relevant areas.

This results in a GPU-friendly design that works efficiently with existing tools like FlashAttention.Image
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(