TuringPost Profile picture
Feb 25 26 tweets 15 min read Read on X
The freshest AI/ML research of the week:

Our top 9
▪️ SigLIP 2
▪️ Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos
▪️ Native Sparse Attention
▪️ OctoTools
▪️ ReLearn
▪️ On the Trustworthiness of Generative Foundation Models
▪️ S* Test Time Scaling for Code Generation
▪️ Autellix (Serving Engine for LLM Agents)
▪️ Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering

▪️ SurveyX
▪️ From RAG to Memory: Non-Parametric Continual Learning for LLMs
▪️ How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
▪️ Train Small, Infer Large
▪️ Eager Updates for Overlapped Communication and Computation in DiLoCo
▪️ S^2R: Teaching LLMs to Self-verify and Self-correct via RL
▪️ Logic-RL
▪️ Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL
▪️ Armap
▪️ Thinking Preference Optimization
▪️ Rethinking Diverse Human Preference Learning through Principal Component Analysis
▪️ Craw4LLM
▪️ LLMs and Mathematical Reasoning Failures
▪️ Small Models Struggle to Learn from Strong Reasoners
▪️ Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options

🧵Image
Image
1. SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, @GoogleDeepMind

Advances vision-language learning with multilingual training and improved zero-shot capabilities

huggingface.co/papers/2502.14…
Checkpoints: github.com/google-researc… x.com/12714828789589…
2. Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos, @AIatMeta

Trains a model on video frame prediction to develop intuitive physics reasoning

huggingface.co/papers/2502.11…
Code and data: : github.com/facebookresear… Image
3. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention, @deepseek_ai

Optimizes sparse attention for long-context models, significantly improving efficiency

huggingface.co/papers/2502.11…Image
4. OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning, @Stanford

Develops a tool-based system for multi-step decision-making and structured tool use

huggingface.co/papers/2502.11…
Project page: octotools.github.io x.com/12714828789589…
5. ReLearn: Unlearning via Learning for Large Language Models

Introduces a knowledge-unlearning method that removes sensitive knowledge without degrading fluency

huggingface.co/papers/2502.11…
Code: github.com/zjunlp/unlearn. Image
6. On the Trustworthiness of Generative Foundation Models – Guideline, Assessment, and Perspective

Develops a framework for evaluating trustworthiness in generative AI models

huggingface.co/papers/2502.14…Image
7. S* Test Time Scaling for Code Generation, @UofCalifornia

Introduces a test-time scaling framework that improves LLM-based code generation through iterative debugging

huggingface.co/papers/2502.14…
Code: github.com/NovaSky-AI/Sky… Image
8. Autellix: An Efficient Serving Engine for LLM Agents as General Programs

Enhances LLM serving efficiency for agentic applications by optimizing request scheduling

huggingface.co/papers/2502.13…Image
9. Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering, @JohnsHopkins

Examines how inference scaling helps LLMs selectively answer questions with confidence

huggingface.co/papers/2502.13…
10. SurveyX: Academic Survey Automation via Large Language Models

Develops an automated system for generating high-quality academic surveys, improving citation precision and evaluation frameworks

huggingface.co/papers/2502.14…Image
11. From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Introduces HippoRAG 2, a retrieval-augmented generation method that enhances long-term memory and retrieval

huggingface.co/papers/2502.14…
Code and data github.com/OSU-NLP-Group/… Image
12. How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?

Examines the trade-offs in integrating new knowledge into LLMs using Low-Rank Adaptation (LoRA)

huggingface.co/papers/2502.14…
13. Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models

Develops LORAM, a memory-efficient fine-tuning approach that enables large model training on low-resource hardware

huggingface.co/papers/2502.13…
Code: github.com/junzhang-zj/Lo… Image
14. Eager Updates for Overlapped Communication and Computation in DiLoCo, @GoogleDeepMind

Reduces communication bottlenecks in distributed LLM training by overlapping updates with computation

huggingface.co/papers/2502.12…Image
15. S^2R: Teaching LLMs to Self-verify and Self-correct via RL

Develops a framework to improve LLM reasoning by teaching self-verification and self-correction

huggingface.co/papers/2502.12…
Code and data: github.com/NineAbyss/S2R. Image
16. Logic-RL: Unleashing LLM Reasoning with Rule-Based RL, @MSFTResearch Asia

Uses RL to enhance logical reasoning capabilities

huggingface.co/papers/2502.14…Image
17. Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL

Optimizes quantum error-correcting codes using RL, reducing physical qubit overhead

huggingface.co/papers/2502.14…Image
18. Armap: Scaling Autonomous Agents via Automatic Reward Modeling and Planning

Introduces a decision-making framework that learns rewards automatically, improving agent-based reasoning

huggingface.co/papers/2502.12…Image
19. Thinking Preference Optimization

Enhances LLM reasoning by refining preference-based optimization of reasoning steps

huggingface.co/papers/2502.13…
Code: github.com/uservan/ThinkPO Image
20. Rethinking Diverse Human Preference Learning through Principal Component Analysis

Improves human preference modeling using principal component analysis (PCA) for better LLM alignment

huggingface.co/papers/2502.13…Image
21. Craw4LLM: Efficient Web Crawling for LLM Pretraining

Optimizes web crawling for LLM training by prioritizing the most impactful pages

huggingface.co/papers/2502.13…
Code: github.com/cxcscmu/Crawl4… Image
22. LLMs and Mathematical Reasoning Failures

Evaluates LLMs on newly designed math problems, exposing weaknesses in multi-step problem-solving

huggingface.co/papers/2502.11…Image
23. Small Models Struggle to Learn from Strong Reasoners

Identifies the limitations of small LLMs in benefiting from chain-of-thought distillation from larger models

huggingface.co/papers/2502.12…
Project page: small-model-gap.github.io Image
24. Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options

Enhances LLM problem-solving by systematically exploring multiple solution paths

huggingface.co/papers/2502.12…Image
Find other important AI and ML news in our free weekly newsletter: huggingface.co/blog/Kseniase/…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with TuringPost

TuringPost Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TheTuringPost

Apr 22
The latest AI/ML news of the week:

It's mostly about @OpenAI this time:

▪️ Models:
- GPT‑4.1 in full, mini, and nano versions
- Codex CLI
- o3 (+ opinions)
- o4-mini
▪️ "A Practical Guide to Building Agents"

Also:
▪️ @hwchase17's blog post "How to think about agent frameworks"
▪️ @AnthropicAI published "Claude Code: Best practices for agentic coding"

Details below 🧵Image
Image
1. @OpenAI dropped GPT‑4.1 in full, mini, and nano flavors – cheaper, faster, and catching up with Google’s million‑token context window.

Available via API but curiously absent from ChatGPT, the move slightly backpedals on Sam Altman’s earlier promise of enhanced reasoning.

openai.com/api/Image
2. Codex CLI

It debuts as a nimble, open-source coding sidekick for your terminal – Claude Code has company.

help.openai.com/en/articles/11…
Read 12 tweets
Apr 21
.@GoogleAI has dropped a very interesting study

They introduced new types of attentional bias strategies in LLMs and reimagined the "forgetting" process, replacing it with "retention."

All of this is wrapped up in Miras – their new framework for designing efficient AI architectures using 4 building blocks:

• Memory architecture – how the memory is built
• Attentional bias – how the model focuses
• Retention gate – how it forgets or keeps information
• Memory learning algorithm – how it’s trained

Details 🧵Image
1. Forgetting? No, it's “retention”

Instead of saying the model forgets, Google researchers use the idea of retention. So the term "forget gate" turns into "retention gate."

The model doesn’t erase past memory—it just decides not to hold on to some things as tightly.
2. New attentional biases:

• Using different ℓₚ norms: Adjust sensitivity to noise (ℓ₁ resists outliers, ℓ₂ is standard, ℓ∞ targets largest errors).

• Huber loss: Blends ℓ₂ (when things are going well) and ℓ₁ (when errors are big) for stable learning with outliers.

• Memory robust to value shifts: Prepares memory for small input variations using worst-case training.
Read 7 tweets
Apr 16
The freshest AI/ML research of the week

Our top 8:

▪️ The AI Scientist v2
▪️ Debug-gym
▪️ OLMoTrace
▪️ Scaling Laws for Native Multimodal Models
▪️ MegaScale-Infer
▪️ Hogwild! Inference
▪️ Self-Steering Language Models
▪️ VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

▪️ Are You Getting What You Pay For?
▪️ MM-IFEngine
▪️ HybriMoE
▪️ C3PO
▪️ Quantization Hurts Reasoning?
▪️ Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
▪️ Concise Reasoning via RL
▪️ Missing Premise exacerbates Overthinking
▪️ DDT
▪️ Adaptive Weighted Rejection Sampling

🧵Image
Image
Image
Image
1. The AI Scientist v2 by @SakanaAILabs, @UBC, @VectorInst, and @UniofOxford

It's an autonomous LLM-based agent that formulates hypotheses, runs experiments, analyzes data, and writes papers. It uses agentic tree search and VLM feedback for iterative refinement, removing human-authored code templates. Of three papers submitted to ICLR 2025 workshops, one passed peer review with a 6.33 score.

pub.sakana.ai/ai-scientist-v…
Code: github.com/SakanaAI/AI-Sc…Image
2. Debug-gym by @Microsoft

Provides an interactive sandboxed coding environment for LLMs to learn step-by-step debugging using tools like pdb. It supports repository-level reasoning and includes benchmarks (Aider, Mini-nightmare, SWE-bench) to assess debugging agents.

microsoft.com/en-us/research…Image
Read 20 tweets
Apr 15
The latest AI/ML news of the week:

▪️ @huggingface and AI robotics

▪️ @Google Cloud Next 2025:
- TPU v7 “Ironwood” AI chip
- Gemini 2.5 Pro and Flash models
- Firebase Studio
- Agent-to-Agent Protocol (A2A)

▪️ @OpenAI:
- ChatGPT gets a better memory – yours
- EU Economic Blueprint
- OpenAI’s Pioneers Program
- BrowseComp: a benchmark for browsing agents

▪️ @Microsoft: Copilot+ gets a memory upgrade

Details below 🧵Image
Image
Image
1. Congrats to our friends at @HuggingFace! Robotics is one of the most interesting areas for AI in the next few years.
2. @Google’s7th-generation AI chip launches later this year with 42.5 exaflops in full config – 24x faster than the world’s top supercomputer.

Each chip offers:
- 4,614 teraflops
- 192GB high-bandwidth memory
- 7.2 Tbps throughput. Built for AI inference at super scale.

blog.google/products/googl…

Sundar Pinchai’s keynote: youtube.com/watch?v=Md4Fs-…Image
Read 14 tweets
Apr 9
How to understand when and how an AI model reflects on its reasoning?

Researchers from @essential_ai built a full framework to track reflection throughout the model’s pre-training.

They tested 2 types reflection:

• Situational reflection: The model reviews someone else’s reasoning (like another AI's).
• Self-reflection: The model reviews its own reasoning.

▪️ The key finding? Models start to reflect much earlier than we thought.

Here are the details:Image
To test reflection, researchers created 6 datasets in different areas: math, coding, logic, and general knowledge.

They gave the model confusing examples, like problems with small mistakes in logic or math, and watched whether it could spot and fix them.
Researchers added cues like the word “Wait,” at the start of a new thought to encourage reflection.
Read 5 tweets
Apr 8
The latest AI/ML news of the week:

▪️ CORLEO from Kawasaki
▪️ Demis Hassabis's @IsomorphicLabs raised $600 million in its first external round
▪️ @genspark_ai Super Agent
▪️ @OpenAI's PaperBench
▪️ @GoogleDeepMind’s Dreamer RL agent
▪️ @AnthropicAI Claude for Education

Details below 🧵Image
Image
Image
1. CORLEO - A horse from Kawasaki

Just take a look ->
2. Demis Hassabis's @IsomorphicLabs has raised $600 million in its first external round, led by Thrive Capital with GV and Alphabet.

The DeepMind-born biotech firm advances its AI drug discovery toward clinical impact across various therapeutic areas.

isomorphiclabs.com/articles/isomo…
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(