TuringPost Profile picture
Newsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
2 subscribers
Jun 27 6 tweets 3 min read
Chain-of-Experts (CoE) - a new kind of model architecture.

It builds on Mixture-of-Experts (MoE) idea that a model can choose a different expert each round.

➡️ As a new addition, experts work in a sequence, one after the other
within a layer.

CoE keeps the number of active experts the same as before, but:

- Uses up to 42% less memory
- Unlocks over 800× more effective expert combinations
- Improves performance

Here's how it works:Image 1. In CoE:

- The model picks a small group of experts.
- Each expert transforms the current hidden state of a token.
- The outputs are combined using gating weights.
- A residual connection helps keep the information stable.

So, the final result is the token after it's been processed by C rounds of experts, with each round learning from the last.Image
Jun 26 7 tweets 4 min read
Models, datasets and benchmarks to pay attention to:

▪️ Gemini 2.5 Flash and Pro, plus Gemini 2.5 Flash-Lite
▪️ MiniMax-M1
▪️ Kimi-Dev-72B

▪️ SHADE-Arena benchmark
▪️ ESSENTIAL-WEB V1.0 dataset

🧵 Image 1. @Google introduced Gemini 2.5 Flash and Pro as stable and production-ready, and launched Gemini 2.5 Flash-Lite in preview – the fastest and most cost-efficient.

Flash-Lite outperforms 2.0 Flash-Lite in coding, math, science, reasoning, and multimodal benchmarks. It features lower latency, supports 1 million-token context, multimodal input, and connects to tools like Google Search and code execution

storage.googleapis.com/deepmind-media…Image
Jun 19 12 tweets 8 min read
Models and datasets to pay attention to:

▪️ Institutional Books 1.0 - a 242B token dataset
▪️ o3-pro from @OpenAI
▪️ FGN from @GoogleDeepMind
▪️ Magistral by @MistralAI
▪️ Resa: Transparent Reasoning Models via SAEs
▪️ Multiverse (Carnegie+NVIDIA)
▪️ Ming-Omni
▪️ Seedance 1.0 by ByteDance
▪️ Sentinel

🧵Image
Image
Image
1. Institutional Books 1.0: A 242B token dataset from Harvard Library's collections, refined for accuracy and usability

Sourced from 1,075,899 scanned books across 250+ languages via the Google Books project, the dataset includes both raw and post-processed text and detailed metadata.

arxiv.org/abs/2506.08300Image
Jun 18 8 tweets 4 min read
The latest AI/ML news if the week:

▪️ @HuggingFace helps to find the best model based on size
▪️ NVIDIA’s Jensen Huang and @ylecun disagree with Anthropic’s Dario Amodei predictions
▪️ @AIatMeta’s Superintelligence Gambit
▪️ @Google adds a voice to Search
▪️ Mattel and @OpenAI: brains to Barbie
▪️ Projects in ChatGPT

Details 🧵Image
Image
Image
1. Hugging Face insists, “Bigger isn’t better”
Jun 10 19 tweets 12 min read
The freshest research papers:

▪️ Self-Challenging Language Model Agents
▪️ Reflect, Retry, Reward
▪️ ProRL
▪️ Beyond the 80/20 Rule
▪️ REASONING GYM
▪️ AlphaOne
▪️ Unleashing the Reasoning Potential...Critique Fine-Tuning
▪️ ARIA
▪️ Incentivizing Reasoning...Instruction Following
▪️ OThink-R1

▪️ Reasoning Like an Economist
▪️ A Controllable Examination for Long-Context LLMs
▪️ SuperWriter

▪️ Protocol Models
▪️ AReaL
▪️ StreamBP
▪️ Taming LLMs by Scaling Learning Rates

▪️ Diagonal Batching
▪️ Inference-Time Hyper-Scaling with KV Cache Compression
▪️ Unified Scaling Laws for Compressed Representations

▪️ GUI-Actor
▪️ Surfer-H Meets Holo1

▪️ Qwen3 Embedding
▪️ Aligning Latent Spaces with Flow Priors
▪️ Large Language Models are Locally Linear Mappings

▪️ Establishing Trustworthy LLM Evaluation
▪️ Evaluation is All You Need
▪️ Datasheets Aren't Enough

🧵Image
Image
Image
1. Self-Challenging Language Model Agents by @AIatMeta, @UCBerkeley

Trains agents to create and solve their own tool-use tasks using code-based problem generation and RL

arxiv.org/abs/2506.01716Image
Jun 7 10 tweets 3 min read
Log-linear attention — a new type of attention proposed by @MIT which is:

- fast and efficient as linear attention
- expressive as softmax

It uses a small but growing number of memory slots that increases logarithmically with the sequence length.

Here's how it works: Image 1. Input:

At each time step t, you have:

- Query vector (Q): what the model is asking
- Key vector (K): what the model remembers
- Value vector (V): what the model retrieves

They are computed from the input using learned linear projections.
Jun 6 16 tweets 3 min read
.@JeffDean interview at @Sequoia’s AI Ascent is a must-watch. He provides a real look at where AI is headed, what’s actually happening in the field, sharing insights on:

• Specialized hardware
• Evolution of models
• Future of computing infrastructure
• AI's role in science and more

Here are the key takeaways:Image 1. Where is AI going these days?

Models are improving fast and solving more problems each year. Hardware, training algorithms, and RL techniques have brought us here — and multimodal is a big focus for what’s next.
May 29 6 tweets 2 min read
Latent reasoning lets the model do more of its "thinking" internally.

This internal info has continuous format compared to the discrete output text.

To efficiently mix this info, researchers from @UofIllinois proposed HRPO (Hybrid Reasoning Policy Optimization) – an RL-based hybrid latent reasoning framework.

Here's how it works:Image 1. HRPO uses reinforcement learning (RL) to train LLMs to reason internally without needing CoT training data.

It integrates hidden states into token sampling using a learnable gating mechanism.
May 26 7 tweets 3 min read
A new recipe for training multimodal models

👉 Mixed together various data types: text next to images, video frames after captions, then webpages, etc. This way the model learns to connect what it reads with what it sees.

ByteDance proposed and implemented this idea in their BAGEL, a new open-source multimodal model.

Here's how it works:Image Architecture:

BAGEL is one giant Transformer with two separate experts inside:

- Understanding expert handles text and ViT image tokens.
- Generation expert handles the VAE image-creation tokens.

These experts are placed side-by-side in every layer and "look" at the same sequence, but each focuses on its own job.
May 24 14 tweets 3 min read
.@sama's interview at @sequoia AI Ascent introduces a lot of insights on:

- How OpenAI came to ChatGPT
- Its aim to be the “core AI subscription”
- AI as an operating system
- What the ideal smart model is
- Main future goals

Here is an outline of his talk with the key ideas: Image 1. Past milestones and directions

- The first consumer product was Dolly API
- OpenAI also tried building a robot hand
- One person and then a team became excited about building LLMs with unsupervised learning, which started with GPT-1, GPT-2. Then GPT-3 showed something cool.
May 20 9 tweets 2 min read
What is the Agentic Web?

8 important updates from #MSBuild

1. Agents as first-class business & M365 entities.

2. Microsoft Entra Agent ID for knowing your agents.

3. NLWeb, MCP, Open Protocols as the foundation layer for an open agent ecosystem.

4. Agentic DevOps revolutionizes software development with GitHub Copilot’s new coding agent.

5. Azure AI Foundry with 1,900+ models & Copilot Studio

6. Collaboration: Human-Agent & Agent-Agent with Teams as a “multiplayer” agent hub.

7. Windows AI Foundry, Foundry Local (for macOS) and open-sourced WSL, NLWeb, and Copilot in VS Code

8. Microsoft Discovery — AI for science

Read more about there updates in our free weekly newsletter: turingpost.com/p/fod101Image 1. Agents as first-class business & M365 entities:

The new Microsoft 365 Copilot unifies chat, search, notebooks, and tools like “Researcher” and “Analyst.” With Copilot Tuning, businesses can tailor agents to their own knowledge, language, and brand voice.
May 20 20 tweets 13 min read
The freshest research of the week:

Our top 9:
▪️ Beyond 'Aha!'
▪️ J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
▪️ The CoT Encyclopedia
▪️ System Prompt Optimization with Meta-Learning
▪️ Parallel Scaling Law for LMs
▪️ Insights into DeepSeek-V3
▪️ QuXAI: Explainers for Hybrid Quantum Machine Learning Models
▪️ AttentionInfluence
▪️ MLE-Dojo

▪️ Learning from Peers in Reasoning Models
▪️ WorldPM
▪️ Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
▪️ Learning Dynamics in Continual Pre-Training for LLMs
▪️ Memorization-Compression Cycles Improve Generalization
▪️ DanceGRPO
▪️ Unified Continuous Generative Model
▪️ Depth Anything with Any Prior
▪️ MetaUAS

🧵Image
Image
1. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models

Proposes aligning models with meta-reasoning abilities (deduction, induction, abduction) to improve reasoning reliability and performance

arxiv.org/abs/2505.10554
Code: github.com/zhiyuanhubj/Me… Image
May 20 9 tweets 3 min read
Designing models and hardware together — is it a new shift for the best
cost-efficient models?

This idea is used in DeepSeek-V3 that is trained on just 2,048 powerful NVIDIA H800 GPUs.

A new research from @deepseek_ai clarifies how DeepSeek-V3 works using its key innovations:

- Multi-head Latent Attention (MLA)
- Mixture of Experts (MoE)
- FP8 mixed-precision training
- Multi-Plane Network Topology

🧵Image 1. Multi-head Latent Attention (MLA)

MLA compresses the KV cache down to 70 KB per token, while other models like LLaMA-3.1 and Qwen2.5 need 7x more.

Thanks to this DeepSeek-V3:
- Handles long conversations
- Runs on limited hardware
- Makes inference cheaper and more scalable Image
May 15 10 tweets 4 min read
The latest AI/ML news of the week:

▪️ OpenAI:
- Reinforcement fine-tuning for o4-mini
- ChatGPT’s deep research reads GitHub repos
- Stargate supercomputing project
- "OpenAI for Countries" global initiative
▪️ Microsoft adopted Google’s A2A to Azure AI Foundry and Copilot Studio
▪️ Google's Gemini features implicit caching
▪️ Gemma - 150 million downloads and 70k+ variants on Hugging Face
▪️ AWS’s new Generative AI Adoption Index

Details 🧵Image 1. Reinforcement fine-tuning is live for o4-mini. By combining COT reasoning with graded task performance, RFT gives domain-specific models serious IQ boosts.

Meanwhile, supervised fine-tuning is now available for GPT-4.1 nano, letting you sculpt OpenAI’s speediest, thriftiest model to your liking.

Plus RFT use cases -> platform.openai.com/docs/guides/rf…
May 14 12 tweets 8 min read
The freshest research papers of the week:

Our top 10:
▪️ Flow-GRPO
▪️ Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
▪️ RM-R1
▪️ Scalable Chain of Thoughts via Elastic Reasoning
▪️ X-Reasoner
▪️ Practical Efficiency of Muon for Pretraining
▪️ Grokking in the Wild
▪️ Teaching Models to Understand (but not Generate) High-risk Data
▪️ LLM-Independent Adaptive RAG
▪️ Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models

Link to the full list of research in the end of🧵Image
Image
Image
Image
1. Flow-GRPO: Training Flow Matching Models via Online RL

Combines flow matching and reinforcement learning for improved text-to-image generation, significantly boosting composition accuracy and human preference alignment

arxiv.org/abs/2505.05470
Code: github.com/yifan123/flow_… Image
Apr 22 12 tweets 6 min read
The latest AI/ML news of the week:

It's mostly about @OpenAI this time:

▪️ Models:
- GPT‑4.1 in full, mini, and nano versions
- Codex CLI
- o3 (+ opinions)
- o4-mini
▪️ "A Practical Guide to Building Agents"

Also:
▪️ @hwchase17's blog post "How to think about agent frameworks"
▪️ @AnthropicAI published "Claude Code: Best practices for agentic coding"

Details below 🧵Image
Image
1. @OpenAI dropped GPT‑4.1 in full, mini, and nano flavors – cheaper, faster, and catching up with Google’s million‑token context window.

Available via API but curiously absent from ChatGPT, the move slightly backpedals on Sam Altman’s earlier promise of enhanced reasoning.

openai.com/api/Image
Apr 21 7 tweets 3 min read
.@GoogleAI has dropped a very interesting study

They introduced new types of attentional bias strategies in LLMs and reimagined the "forgetting" process, replacing it with "retention."

All of this is wrapped up in Miras – their new framework for designing efficient AI architectures using 4 building blocks:

• Memory architecture – how the memory is built
• Attentional bias – how the model focuses
• Retention gate – how it forgets or keeps information
• Memory learning algorithm – how it’s trained

Details 🧵Image 1. Forgetting? No, it's “retention”

Instead of saying the model forgets, Google researchers use the idea of retention. So the term "forget gate" turns into "retention gate."

The model doesn’t erase past memory—it just decides not to hold on to some things as tightly.
Apr 16 20 tweets 13 min read
The freshest AI/ML research of the week

Our top 8:

▪️ The AI Scientist v2
▪️ Debug-gym
▪️ OLMoTrace
▪️ Scaling Laws for Native Multimodal Models
▪️ MegaScale-Infer
▪️ Hogwild! Inference
▪️ Self-Steering Language Models
▪️ VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks

▪️ Are You Getting What You Pay For?
▪️ MM-IFEngine
▪️ HybriMoE
▪️ C3PO
▪️ Quantization Hurts Reasoning?
▪️ Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
▪️ Concise Reasoning via RL
▪️ Missing Premise exacerbates Overthinking
▪️ DDT
▪️ Adaptive Weighted Rejection Sampling

🧵Image
Image
Image
Image
1. The AI Scientist v2 by @SakanaAILabs, @UBC, @VectorInst, and @UniofOxford

It's an autonomous LLM-based agent that formulates hypotheses, runs experiments, analyzes data, and writes papers. It uses agentic tree search and VLM feedback for iterative refinement, removing human-authored code templates. Of three papers submitted to ICLR 2025 workshops, one passed peer review with a 6.33 score.

pub.sakana.ai/ai-scientist-v…
Code: github.com/SakanaAI/AI-Sc…Image
Apr 15 14 tweets 7 min read
The latest AI/ML news of the week:

▪️ @huggingface and AI robotics

▪️ @Google Cloud Next 2025:
- TPU v7 “Ironwood” AI chip
- Gemini 2.5 Pro and Flash models
- Firebase Studio
- Agent-to-Agent Protocol (A2A)

▪️ @OpenAI:
- ChatGPT gets a better memory – yours
- EU Economic Blueprint
- OpenAI’s Pioneers Program
- BrowseComp: a benchmark for browsing agents

▪️ @Microsoft: Copilot+ gets a memory upgrade

Details below 🧵Image
Image
Image
1. Congrats to our friends at @HuggingFace! Robotics is one of the most interesting areas for AI in the next few years.
Apr 9 5 tweets 2 min read
How to understand when and how an AI model reflects on its reasoning?

Researchers from @essential_ai built a full framework to track reflection throughout the model’s pre-training.

They tested 2 types reflection:

• Situational reflection: The model reviews someone else’s reasoning (like another AI's).
• Self-reflection: The model reviews its own reasoning.

▪️ The key finding? Models start to reflect much earlier than we thought.

Here are the details:Image To test reflection, researchers created 6 datasets in different areas: math, coding, logic, and general knowledge.

They gave the model confusing examples, like problems with small mistakes in logic or math, and watched whether it could spot and fix them.
Apr 8 8 tweets 4 min read
The latest AI/ML news of the week:

▪️ CORLEO from Kawasaki
▪️ Demis Hassabis's @IsomorphicLabs raised $600 million in its first external round
▪️ @genspark_ai Super Agent
▪️ @OpenAI's PaperBench
▪️ @GoogleDeepMind’s Dreamer RL agent
▪️ @AnthropicAI Claude for Education

Details below 🧵Image
Image
Image
1. CORLEO - A horse from Kawasaki

Just take a look ->