Ksenia_TuringPost Profile picture
Newsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
2 subscribers
Dec 7 9 tweets 3 min read
This Google paper presented at #NeurIPS2025 is a true gem.

In their search for a better backbone for sequence models, they:

• Reframe Transformers & RNNs as associative memory systems driven by attentional bias
• Reinterpret "forgetting" as retention regularization, not as erasure
• Combine these insights into Miras – a unified framework for designing next-gen sequence architectures

From this perspective, they introduce 3 new models, Moneta, Yaad, and Memora, that:

- Beat Transformers, Mamba2, DeltaNet, and hybrids across key benchmarks
- Scale better to long contexts
- Deliver state-of-the-art recall on needle-in-a-haystack tests

Here are the details (really worth exploring):Image Transformers traditionally dominate because they scale well, but they become slow and expensive for long sequences since attention grows quadratically.

Google's key idea draws from human attentional bias – our natural habit of focusing more on certain things than others.
Nov 6 11 tweets 3 min read
Supervised Fine-Tuning (SFT) + Reinforcement Learning with Verifiable Rewards (RLVR) = Supervised Reinforcement Learning (SRL)

Google Cloud AI Research introduced a new SRL training method that overcomes the issues of SFT and RLVR.

The main idea: it treats problem-solving as a sequence of logical actions.

Here is how it works:Image What's the problem with common methods?

- Reinforcement Learning with Verifiable Rewards (RLVR) struggles when it can’t find correct examples to learn from.
- Supervised Fine-Tuning (SFT) tends to copy right answers too rigidly, token by token.

@googlecloud AI Research offer to fix both problems with SRL.
Oct 17 9 tweets 3 min read
.@nvidia introduced a new RL approach that’s both faster and lighter on compute.

QeRL's idea is to combine 2 things:

- Quantization (NVFP4)
- Low-Rank Adaptation (LoRA)

But a key innovation is Adaptive Quantization Noise (AQN): QeRL turns quantization noise into an exploration tool, adjusting it on the fly during RL.

Here are the details:Image 1. QeRL builds two RL algorithms for LLMs:

- GRPO: creates multiple answers for a prompt, scores them with rule-based rewards, and updates the model using average scores.
- Dynamic Sampling Policy Optimization (DAPO): removes limits on how much the model can vary during training so that it can discover more diverse solutions.

Upon this, QeRL adds quantization.Image
Oct 10 9 tweets 3 min read
Tiny Recursive Model (TRM) is a simple, effective approach built on the idea: do more with less.

It uses just 1 small 2-layer network that recursively improves its own answers.

With only 7M parameters, TRM sets new records, beating LLMs 10,000× larger:

- Sudoku-Extreme: 55% → 87%
- Maze-Hard: 75% → 85%
- ARC-AGI-1: 40% → 45%
- ARC-AGI-2: 5% → 8%

Here is how it works:Image 1. TRM is built on the idea of the Hierarchical Reasoning Model (HRM).

HRM uses 2 small neural networks working together, each at its own rhythm, to successfully solve hard problems like Sudoku, mazes, and ARC-AGI puzzles, though it’s tiny (27 million parameters).

TRM is a simpler, smaller alternative to HRM.
Oct 3 9 tweets 3 min read
Retrieval-of-Thought (RoT) makes reasoning models faster by reusing earlier reasoning steps as templates.

These steps are stored in a “thought graph” that shows both their order and meaning.

As a result, RoT:

- reduces output tokens by up to 40%
- speeds up inference by 82%
- lowers cost by 59%

All without losing accuracy.

Here is how it works:Image RoT works by:

- Storing reasoning steps as nodes in a “thought graph.”
- Retrieving relevant steps when a new problem comes in.
- Assembling a dynamic template from those steps to guide the model.

Let’s take it step by step
Aug 27 10 tweets 6 min read
7 Notable models of the week

Open-source
▪️ Intern-s1
▪️ Nemotron Nano 2 by @NVIDIA
▪️ DeepSeek V3.1
▪️ Ovis2.5 by @AlibabaGroup
▪️ Matrix-game 2.0 by @Skywork_ai

▪️ Command A Reasoning by @Cohere
▪️ Dinov3 from @AIatMeta

Details 🧵

Also check out the most important weekly AI news here ->
turingpost.com/p/fod115Image 1. Intern-s1: A scientific multimodal foundation model by Shanghai AI Lab (open-source)

This is a 241B-parameter multimodal Mixture-of-Experts model with 28B active parameters, optimized for scientific reasoning:

- Trained on 5T tokens (2.5T scientific)
- Supports text, images, molecular structures, and time-series data.
- Has a dynamic tokenizer and Mixture-of-Rewards RL framework
- Outperforms both open- and closed-source models on MatBench, ChemBench, etc.

arxiv.org/abs/2508.15763Image
Aug 12 12 tweets 7 min read
The freshest AI/ML research of the week

Our top 9
▪️ Sotopia-RL: Reward Design for Social Intelligence
▪️ Agent Lightning: Train ANY AI Agents with RL
▪️ Exploitation Is All You Need... for Exploration
▪️ Learning to Reason for Factuality
▪️ VeOmni
▪️ Is Chain-of-Thought Reasoning of LLMs a Mirage?
▪️ Cognitive Loop via In-Situ Optimization
▪️ Sculptor
▪️ CoAct-1

▪️ Tool-integrated Reinforcement Learning for Repo Deep Search
▪️ RL-PLUS
▪️ SEAgent
▪️ CRINN
▪️ Training Long-Context, Multi-Turn Software Engineering Agents with RL
▪️ Beyond the Trade-off: Self-Supervised RL for Reasoning Models' Instruction Following
▪️ CompassVerifier
▪️ Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
▪️ Are Today's LLMs Ready to Explain Well-Being Concepts?
▪️ VeriGUI
▪️ Trainable Dynamic Mask Sparse Attention
▪️ LeanK
▪️ Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models
▪️ On the Generalization of SFT
▪️ SitEmb-v1.5
▪️ AttnTrace
▪️ LaTCoder
▪️ ChartCap

🧵Image 1. Sotopia-RL: Reward Design for Social Intelligence

Trains socially intelligent agents with utterance-level, multi-dimensional rewards to capture nuanced social behaviors

arxiv.org/abs/2508.03905
Project page: rl.sotopia.world Image
Jul 11 5 tweets 2 min read
SingLoRA is a new simple version of LoRA (Low Rank Adaptation) by Technion that uses only one small matrix instead of usual two.

It multiplies it by its own transpose (like A × Aᵀ).

What does it buy you?

- No scale mismatch between different matrices
- Uses ~half the parameters of LoRA
- Stability and better learning

Here's how it works:Image
Image
1. Workflow of SingLoRA:

• The original weights of the model (W₀) are frozen.
• The system adds a small adapter - a learnable piece that updates the model for your specific task.
In SigLoRA, it's A × Aᵀ, where:
- A is a small trainable matrix with n × r size, where r ≪ n
- Aᵀ is its transpose
• The original model and the adapter are combined like this:Image
Jul 1 19 tweets 12 min read
The freshest AI/ML research papers of the week

Our top 7:

▪️ OctoThinker
▪️ Performance Prediction for Large Systems via Text-to-Text Regression
▪️ Radial Attention
▪️ MADrive
▪️ Mind2Web 2
▪️ Chain-of-Experts
▪️ Ark

▪️ Where to find Grokking
▪️ Skywork-SWE
▪️ BlenderFusion
▪️ OmniGen2
▪️ LLaVA-Scissor
▪️ MMSearch-R1
▪️ LongWriter-Zero
▪️ Steering Conceptual Bias
▪️ WorldVLA

🧵Image
Image
Image
1. OctoThinker

Improves reinforcement learning alignment via mid-training strategies and math-intensive corpora

arxiv.org/abs/2506.20512
GitHub: github.com/GAIR-NLP/OctoT… Image
Jun 28 17 tweets 3 min read
30 days, 15 AI Coding Agents, one prompt — and the results will surprise you!

Will Schenk, TheFocusAI, specially for Turing Post tested which coding tool could best build a Dockerized idea app with voting, notes, and file attachments.

You would not believe what he discovered about Cursor, v0, Copilot, and 12 others 🧵Image 1. Aider @aider_chat
This free, open-source CLI cranks out solid code faster than GitHub’s $20/month Copilot.

Grab the full June 2025 Coding Agent Report for code quality, testing, and more surprising and useful details to know what agent to hire -> github.com/The-Focus-AI/j…
Jun 27 6 tweets 3 min read
Chain-of-Experts (CoE) - a new kind of model architecture.

It builds on Mixture-of-Experts (MoE) idea that a model can choose a different expert each round.

➡️ As a new addition, experts work in a sequence, one after the other
within a layer.

CoE keeps the number of active experts the same as before, but:

- Uses up to 42% less memory
- Unlocks over 800× more effective expert combinations
- Improves performance

Here's how it works:Image 1. In CoE:

- The model picks a small group of experts.
- Each expert transforms the current hidden state of a token.
- The outputs are combined using gating weights.
- A residual connection helps keep the information stable.

So, the final result is the token after it's been processed by C rounds of experts, with each round learning from the last.Image
Jun 26 7 tweets 4 min read
Models, datasets and benchmarks to pay attention to:

▪️ Gemini 2.5 Flash and Pro, plus Gemini 2.5 Flash-Lite
▪️ MiniMax-M1
▪️ Kimi-Dev-72B

▪️ SHADE-Arena benchmark
▪️ ESSENTIAL-WEB V1.0 dataset

🧵 Image 1. @Google introduced Gemini 2.5 Flash and Pro as stable and production-ready, and launched Gemini 2.5 Flash-Lite in preview – the fastest and most cost-efficient.

Flash-Lite outperforms 2.0 Flash-Lite in coding, math, science, reasoning, and multimodal benchmarks. It features lower latency, supports 1 million-token context, multimodal input, and connects to tools like Google Search and code execution

storage.googleapis.com/deepmind-media…Image
Jun 19 12 tweets 8 min read
Models and datasets to pay attention to:

▪️ Institutional Books 1.0 - a 242B token dataset
▪️ o3-pro from @OpenAI
▪️ FGN from @GoogleDeepMind
▪️ Magistral by @MistralAI
▪️ Resa: Transparent Reasoning Models via SAEs
▪️ Multiverse (Carnegie+NVIDIA)
▪️ Ming-Omni
▪️ Seedance 1.0 by ByteDance
▪️ Sentinel

🧵Image
Image
Image
1. Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability

Sourced from 1,075,899 scanned books across 250+ languages via the Google Books project, the dataset includes both raw and post-processed text and detailed metadata.

arxiv.org/abs/2506.08300Image
Jun 18 8 tweets 4 min read
The latest AI/ML news if the week:

▪️ @HuggingFace helps to find the best model based on size
▪️ NVIDIA’s Jensen Huang and @ylecun disagree with Anthropic’s Dario Amodei predictions
▪️ @AIatMeta’s Superintelligence Gambit
▪️ @Google adds a voice to Search
▪️ Mattel and @OpenAI: brains to Barbie
▪️ Projects in ChatGPT

Details 🧵Image
Image
Image
1. Hugging Face insists, “Bigger isn’t better”
Jun 10 19 tweets 12 min read
The freshest research papers:

▪️ Self-Challenging Language Model Agents
▪️ Reflect, Retry, Reward
▪️ ProRL
▪️ Beyond the 80/20 Rule
▪️ REASONING GYM
▪️ AlphaOne
▪️ Unleashing the Reasoning Potential...Critique Fine-Tuning
▪️ ARIA
▪️ Incentivizing Reasoning...Instruction Following
▪️ OThink-R1

▪️ Reasoning Like an Economist
▪️ A Controllable Examination for Long-Context LLMs
▪️ SuperWriter

▪️ Protocol Models
▪️ AReaL
▪️ StreamBP
▪️ Taming LLMs by Scaling Learning Rates

▪️ Diagonal Batching
▪️ Inference-Time Hyper-Scaling with KV Cache Compression
▪️ Unified Scaling Laws for Compressed Representations

▪️ GUI-Actor
▪️ Surfer-H Meets Holo1

▪️ Qwen3 Embedding
▪️ Aligning Latent Spaces with Flow Priors
▪️ Large Language Models are Locally Linear Mappings

▪️ Establishing Trustworthy LLM Evaluation
▪️ Evaluation is All You Need
▪️ Datasheets Aren't Enough

🧵Image
Image
Image
1. Self-Challenging Language Model Agents by @AIatMeta, @UCBerkeley

Trains agents to create and solve their own tool-use tasks using code-based problem generation and RL

arxiv.org/abs/2506.01716Image
Jun 7 10 tweets 3 min read
Log-linear attention — a new type of attention proposed by @MIT which is:

- fast and efficient as linear attention
- expressive as softmax

It uses a small but growing number of memory slots that increases logarithmically with the sequence length.

Here's how it works: Image 1. Input:

At each time step t, you have:

- Query vector (Q): what the model is asking
- Key vector (K): what the model remembers
- Value vector (V): what the model retrieves

They are computed from the input using learned linear projections.
Jun 6 16 tweets 3 min read
.@JeffDean interview at @Sequoia’s AI Ascent is a must-watch. He provides a real look at where AI is headed, what’s actually happening in the field, sharing insights on:

• Specialized hardware
• Evolution of models
• Future of computing infrastructure
• AI's role in science and more

Here are the key takeaways:Image 1. Where is AI going these days?

Models are improving fast and solving more problems each year. Hardware, training algorithms, and RL techniques have brought us here — and multimodal is a big focus for what’s next.
May 29 6 tweets 2 min read
Latent reasoning lets the model do more of its "thinking" internally.

This internal info has continuous format compared to the discrete output text.

To efficiently mix this info, researchers from @UofIllinois proposed HRPO (Hybrid Reasoning Policy Optimization) – an RL-based hybrid latent reasoning framework.

Here's how it works:Image 1. HRPO uses reinforcement learning (RL) to train LLMs to reason internally without needing CoT training data.

It integrates hidden states into token sampling using a learnable gating mechanism.
May 26 7 tweets 3 min read
A new recipe for training multimodal models

👉 Mixed together various data types: text next to images, video frames after captions, then webpages, etc. This way the model learns to connect what it reads with what it sees.

ByteDance proposed and implemented this idea in their BAGEL, a new open-source multimodal model.

Here's how it works:Image Architecture:

BAGEL is one giant Transformer with two separate experts inside:

- Understanding expert handles text and ViT image tokens.
- Generation expert handles the VAE image-creation tokens.

These experts are placed side-by-side in every layer and "look" at the same sequence, but each focuses on its own job.
May 24 14 tweets 3 min read
.@sama's interview at @sequoia AI Ascent introduces a lot of insights on:

- How OpenAI came to ChatGPT
- Its aim to be the “core AI subscription”
- AI as an operating system
- What the ideal smart model is
- Main future goals

Here is an outline of his talk with the key ideas: Image 1. Past milestones and directions

- The first consumer product was Dolly API
- OpenAI also tried building a robot hand
- One person and then a team became excited about building LLMs with unsupervised learning, which started with GPT-1, GPT-2. Then GPT-3 showed something cool.
May 20 9 tweets 2 min read
What is the Agentic Web?

8 important updates from #MSBuild

1. Agents as first-class business & M365 entities.

2. Microsoft Entra Agent ID for knowing your agents.

3. NLWeb, MCP, Open Protocols as the foundation layer for an open agent ecosystem.

4. Agentic DevOps revolutionizes software development with GitHub Copilot’s new coding agent.

5. Azure AI Foundry with 1,900+ models & Copilot Studio

6. Collaboration: Human-Agent & Agent-Agent with Teams as a “multiplayer” agent hub.

7. Windows AI Foundry, Foundry Local (for macOS) and open-sourced WSL, NLWeb, and Copilot in VS Code

8. Microsoft Discovery — AI for science

Read more about there updates in our free weekly newsletter: turingpost.com/p/fod101Image 1. Agents as first-class business & M365 entities:

The new Microsoft 365 Copilot unifies chat, search, notebooks, and tools like “Researcher” and “Analyst.” With Copilot Tuning, businesses can tailor agents to their own knowledge, language, and brand voice.