Newsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
2 subscribers
May 29 • 6 tweets • 2 min read
Latent reasoning lets the model do more of its "thinking" internally.
This internal info has continuous format compared to the discrete output text.
To efficiently mix this info, researchers from @UofIllinois proposed HRPO (Hybrid Reasoning Policy Optimization) – an RL-based hybrid latent reasoning framework.
Here's how it works:1. HRPO uses reinforcement learning (RL) to train LLMs to reason internally without needing CoT training data.
It integrates hidden states into token sampling using a learnable gating mechanism.
May 26 • 7 tweets • 3 min read
A new recipe for training multimodal models
👉 Mixed together various data types: text next to images, video frames after captions, then webpages, etc. This way the model learns to connect what it reads with what it sees.
ByteDance proposed and implemented this idea in their BAGEL, a new open-source multimodal model.
Here's how it works:
Architecture:
BAGEL is one giant Transformer with two separate experts inside:
- Understanding expert handles text and ViT image tokens.
- Generation expert handles the VAE image-creation tokens.
These experts are placed side-by-side in every layer and "look" at the same sequence, but each focuses on its own job.
May 24 • 14 tweets • 3 min read
.@sama's interview at @sequoia AI Ascent introduces a lot of insights on:
- How OpenAI came to ChatGPT
- Its aim to be the “core AI subscription”
- AI as an operating system
- What the ideal smart model is
- Main future goals
Here is an outline of his talk with the key ideas: 1. Past milestones and directions
- The first consumer product was Dolly API
- OpenAI also tried building a robot hand
- One person and then a team became excited about building LLMs with unsupervised learning, which started with GPT-1, GPT-2. Then GPT-3 showed something cool.
May 20 • 9 tweets • 2 min read
What is the Agentic Web?
8 important updates from #MSBuild
1. Agents as first-class business & M365 entities.
2. Microsoft Entra Agent ID for knowing your agents.
3. NLWeb, MCP, Open Protocols as the foundation layer for an open agent ecosystem.
4. Agentic DevOps revolutionizes software development with GitHub Copilot’s new coding agent.
5. Azure AI Foundry with 1,900+ models & Copilot Studio
6. Collaboration: Human-Agent & Agent-Agent with Teams as a “multiplayer” agent hub.
7. Windows AI Foundry, Foundry Local (for macOS) and open-sourced WSL, NLWeb, and Copilot in VS Code
8. Microsoft Discovery — AI for science
Read more about there updates in our free weekly newsletter: turingpost.com/p/fod1011. Agents as first-class business & M365 entities:
The new Microsoft 365 Copilot unifies chat, search, notebooks, and tools like “Researcher” and “Analyst.” With Copilot Tuning, businesses can tailor agents to their own knowledge, language, and brand voice.
May 20 • 20 tweets • 13 min read
The freshest research of the week:
Our top 9:
▪️ Beyond 'Aha!'
▪️ J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
▪️ The CoT Encyclopedia
▪️ System Prompt Optimization with Meta-Learning
▪️ Parallel Scaling Law for LMs
▪️ Insights into DeepSeek-V3
▪️ QuXAI: Explainers for Hybrid Quantum Machine Learning Models
▪️ AttentionInfluence
▪️ MLE-Dojo
▪️ Learning from Peers in Reasoning Models
▪️ WorldPM
▪️ Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
▪️ Learning Dynamics in Continual Pre-Training for LLMs
▪️ Memorization-Compression Cycles Improve Generalization
▪️ DanceGRPO
▪️ Unified Continuous Generative Model
▪️ Depth Anything with Any Prior
▪️ MetaUAS
🧵 1. Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models
Proposes aligning models with meta-reasoning abilities (deduction, induction, abduction) to improve reasoning reliability and performance
Designing models and hardware together — is it a new shift for the best
cost-efficient models?
This idea is used in DeepSeek-V3 that is trained on just 2,048 powerful NVIDIA H800 GPUs.
A new research from @deepseek_ai clarifies how DeepSeek-V3 works using its key innovations:
- Multi-head Latent Attention (MLA)
- Mixture of Experts (MoE)
- FP8 mixed-precision training
- Multi-Plane Network Topology
🧵1. Multi-head Latent Attention (MLA)
MLA compresses the KV cache down to 70 KB per token, while other models like LLaMA-3.1 and Qwen2.5 need 7x more.
Thanks to this DeepSeek-V3:
- Handles long conversations
- Runs on limited hardware
- Makes inference cheaper and more scalable
May 15 • 10 tweets • 4 min read
The latest AI/ML news of the week:
▪️ OpenAI:
- Reinforcement fine-tuning for o4-mini
- ChatGPT’s deep research reads GitHub repos
- Stargate supercomputing project
- "OpenAI for Countries" global initiative
▪️ Microsoft adopted Google’s A2A to Azure AI Foundry and Copilot Studio
▪️ Google's Gemini features implicit caching
▪️ Gemma - 150 million downloads and 70k+ variants on Hugging Face
▪️ AWS’s new Generative AI Adoption Index
Details 🧵1. Reinforcement fine-tuning is live for o4-mini. By combining COT reasoning with graded task performance, RFT gives domain-specific models serious IQ boosts.
Meanwhile, supervised fine-tuning is now available for GPT-4.1 nano, letting you sculpt OpenAI’s speediest, thriftiest model to your liking.
Our top 10:
▪️ Flow-GRPO
▪️ Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
▪️ RM-R1
▪️ Scalable Chain of Thoughts via Elastic Reasoning
▪️ X-Reasoner
▪️ Practical Efficiency of Muon for Pretraining
▪️ Grokking in the Wild
▪️ Teaching Models to Understand (but not Generate) High-risk Data
▪️ LLM-Independent Adaptive RAG
▪️ Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models
Link to the full list of research in the end of🧵 1. Flow-GRPO: Training Flow Matching Models via Online RL
Combines flow matching and reinforcement learning for improved text-to-image generation, significantly boosting composition accuracy and human preference alignment
▪️ Models:
- GPT‑4.1 in full, mini, and nano versions
- Codex CLI
- o3 (+ opinions)
- o4-mini
▪️ "A Practical Guide to Building Agents"
Also:
▪️ @hwchase17's blog post "How to think about agent frameworks"
▪️ @AnthropicAI published "Claude Code: Best practices for agentic coding"
Details below 🧵 1. @OpenAI dropped GPT‑4.1 in full, mini, and nano flavors – cheaper, faster, and catching up with Google’s million‑token context window.
Available via API but curiously absent from ChatGPT, the move slightly backpedals on Sam Altman’s earlier promise of enhanced reasoning.
They introduced new types of attentional bias strategies in LLMs and reimagined the "forgetting" process, replacing it with "retention."
All of this is wrapped up in Miras – their new framework for designing efficient AI architectures using 4 building blocks:
• Memory architecture – how the memory is built
• Attentional bias – how the model focuses
• Retention gate – how it forgets or keeps information
• Memory learning algorithm – how it’s trained
Details 🧵1. Forgetting? No, it's “retention”
Instead of saying the model forgets, Google researchers use the idea of retention. So the term "forget gate" turns into "retention gate."
The model doesn’t erase past memory—it just decides not to hold on to some things as tightly.
Apr 16 • 20 tweets • 13 min read
The freshest AI/ML research of the week
Our top 8:
▪️ The AI Scientist v2
▪️ Debug-gym
▪️ OLMoTrace
▪️ Scaling Laws for Native Multimodal Models
▪️ MegaScale-Infer
▪️ Hogwild! Inference
▪️ Self-Steering Language Models
▪️ VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
▪️ Are You Getting What You Pay For?
▪️ MM-IFEngine
▪️ HybriMoE
▪️ C3PO
▪️ Quantization Hurts Reasoning?
▪️ Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
▪️ Concise Reasoning via RL
▪️ Missing Premise exacerbates Overthinking
▪️ DDT
▪️ Adaptive Weighted Rejection Sampling
🧵 1. The AI Scientist v2 by @SakanaAILabs, @UBC, @VectorInst, and @UniofOxford
It's an autonomous LLM-based agent that formulates hypotheses, runs experiments, analyzes data, and writes papers. It uses agentic tree search and VLM feedback for iterative refinement, removing human-authored code templates. Of three papers submitted to ICLR 2025 workshops, one passed peer review with a 6.33 score.
How to understand when and how an AI model reflects on its reasoning?
Researchers from @essential_ai built a full framework to track reflection throughout the model’s pre-training.
They tested 2 types reflection:
• Situational reflection: The model reviews someone else’s reasoning (like another AI's).
• Self-reflection: The model reviews its own reasoning.
▪️ The key finding? Models start to reflect much earlier than we thought.
Here are the details:
To test reflection, researchers created 6 datasets in different areas: math, coding, logic, and general knowledge.
They gave the model confusing examples, like problems with small mistakes in logic or math, and watched whether it could spot and fix them.
Apr 8 • 8 tweets • 4 min read
The latest AI/ML news of the week:
▪️ CORLEO from Kawasaki
▪️ Demis Hassabis's @IsomorphicLabs raised $600 million in its first external round
▪️ @genspark_ai Super Agent
▪️ @OpenAI's PaperBench
▪️ @GoogleDeepMind’s Dreamer RL agent
▪️ @AnthropicAI Claude for Education
Details below 🧵 1. CORLEO - A horse from Kawasaki
Just take a look ->
Apr 1 • 9 tweets • 5 min read
The latest AI/ML news of the week:
▪️ @GoogleDeepMind's Gemini Robotics
▪️ @Google's free for all Gemini 2.5 Pro
▪️ @OpenAI:
- OpenAI Academy
- Images in ChatGPT
- Adopting @AnthropicAI’s Model Context Protocol
▪️ @elonmusk fuses X and xAI
▪️ @TheMidasProj's "AI Safety Watchtower" monitors policy changes
Details below 🧵 1. @GoogleDeepMind’s Gemini Robotics powers robots with a Vision-Language-Action model that grasps, points, packs, and even folds origami.
Built on Gemini 2.0, With zero- and few-shot learning, it adapts to new tasks and robot bodies on the fly – no retraining required.
Our top 2
▪️ Xattention
▪️ Inside-Out: Hidden Factual Knowledge in LLMs
▪️ Rwkv-7 "Goose"
▪️ ϕ-Decoding
▪️ Frac-connections
▪️ DAPO
▪️ Reinforcement learning for reasoning in small LLMs
▪️ MetaLadder
▪️ Measuring AI ability to complete long tasks
▪️ Why do multi-agent LLM systems fail?
▪️ Agents play thousands of 3D video games
▪️ GKG-LLM
▪️ Privacy, Synthetic Data, and Security
▪️ Scale-wise distillation of diffusion models
▪️ Multimodal chain-of-thought reasoning
▪️ Survey on evaluation of LLM-based agents
▪️ Stop overthinking: A survey on efficient reasoning
▪️ Aligning multimodal LLM with human preference
🧵 1. Xattention by @MIT, @Tsinghua_Uni, @sjtu1896 and @nvidia
Speeds up inference with block-sparse attention and antidiagonal scoring
There’s no single “right” answer for AI models in creative writing (like creating a story tale), and their open-ended thinking is a key part of creative intelligence.
Still, models often lack output diversity, so @midjourney dropped an interesting study on this 👇
▪️ Their idea is to add diversity directly into the training process:
They measured response deviation for the same prompt and used it to train with DPO and ORPO, leading to more diversified DDPO and DORPO methods.
Here's how DDPO and DORPO work:1. Diversified DPO (DDPO):
In the regular DPO method, the model learns by comparing a better response to a worse one.
In diversified version, researchers add more weight to rare or unique winning responses—those with higher deviation.
This helps the model pay more attention to uncommon but high-quality examples during training.
Mar 18 • 10 tweets • 3 min read
DiLoCo (Distributed Low-Communication) method by @GoogleAI and @GoogleDeepMind changes how training of models happens:
Instead of constant syncing, multiple copies of the model are trained in parallel and sync only occasionally.
Scaling laws show how DiLoCo works as models' size grows🧵
At its core, DiLoCo follows a 2-level optimization process:
• Inner optimization: Each model replica (M) trains independently, making local updates.
• Outer optimization: Every H steps, replicas sync their updates to adjust a global model, which is then shared with all replicas, repeating the cycle.
Here are scaling laws for DiLoCo:
Mar 11 • 8 tweets • 4 min read
The latest AI/ML news of the week:
▪️ @perplexity_ai expands beyond the web
▪️ Manus: a Chinese high-performing AI agent
▪️ @Apple delayed Siri AI enhancements and new M3 Ultra chip
▪️ @CorticalLabs' CL1 computer fuses human brain cells with silicon
▪️ @MistralAI OCR
▪️ Andrew Barto and @RichardSSutton take home the 2024 Turing Award!
Find the details below 🧵 1. @perplexity_ai expands beyond the web
It partners with hardware firms to integrate its AI into everyday devices. This year, Deutsche Telekom’s AI Phone launches with Perplexity’s assistant, hinting at future moves. Phones for now, then TVs? Where next?