Newsletter exploring AI & ML - AI 101 - ML techniques - AI Business insights - Global dynamics - ML History Led by @kseniase_ Save hours of research 👇🏼
2 subscribers
Apr 16 • 20 tweets • 13 min read
The freshest AI/ML research of the week
Our top 8:
▪️ The AI Scientist v2
▪️ Debug-gym
▪️ OLMoTrace
▪️ Scaling Laws for Native Multimodal Models
▪️ MegaScale-Infer
▪️ Hogwild! Inference
▪️ Self-Steering Language Models
▪️ VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
▪️ Are You Getting What You Pay For?
▪️ MM-IFEngine
▪️ HybriMoE
▪️ C3PO
▪️ Quantization Hurts Reasoning?
▪️ Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
▪️ Concise Reasoning via RL
▪️ Missing Premise exacerbates Overthinking
▪️ DDT
▪️ Adaptive Weighted Rejection Sampling
🧵 1. The AI Scientist v2 by @SakanaAILabs, @UBC, @VectorInst, and @UniofOxford
It's an autonomous LLM-based agent that formulates hypotheses, runs experiments, analyzes data, and writes papers. It uses agentic tree search and VLM feedback for iterative refinement, removing human-authored code templates. Of three papers submitted to ICLR 2025 workshops, one passed peer review with a 6.33 score.
How to understand when and how an AI model reflects on its reasoning?
Researchers from @essential_ai built a full framework to track reflection throughout the model’s pre-training.
They tested 2 types reflection:
• Situational reflection: The model reviews someone else’s reasoning (like another AI's).
• Self-reflection: The model reviews its own reasoning.
▪️ The key finding? Models start to reflect much earlier than we thought.
Here are the details:
To test reflection, researchers created 6 datasets in different areas: math, coding, logic, and general knowledge.
They gave the model confusing examples, like problems with small mistakes in logic or math, and watched whether it could spot and fix them.
Apr 8 • 8 tweets • 4 min read
The latest AI/ML news of the week:
▪️ CORLEO from Kawasaki
▪️ Demis Hassabis's @IsomorphicLabs raised $600 million in its first external round
▪️ @genspark_ai Super Agent
▪️ @OpenAI's PaperBench
▪️ @GoogleDeepMind’s Dreamer RL agent
▪️ @AnthropicAI Claude for Education
Details below 🧵 1. CORLEO - A horse from Kawasaki
Just take a look ->
Apr 1 • 9 tweets • 5 min read
The latest AI/ML news of the week:
▪️ @GoogleDeepMind's Gemini Robotics
▪️ @Google's free for all Gemini 2.5 Pro
▪️ @OpenAI:
- OpenAI Academy
- Images in ChatGPT
- Adopting @AnthropicAI’s Model Context Protocol
▪️ @elonmusk fuses X and xAI
▪️ @TheMidasProj's "AI Safety Watchtower" monitors policy changes
Details below 🧵 1. @GoogleDeepMind’s Gemini Robotics powers robots with a Vision-Language-Action model that grasps, points, packs, and even folds origami.
Built on Gemini 2.0, With zero- and few-shot learning, it adapts to new tasks and robot bodies on the fly – no retraining required.
Our top 2
▪️ Xattention
▪️ Inside-Out: Hidden Factual Knowledge in LLMs
▪️ Rwkv-7 "Goose"
▪️ ϕ-Decoding
▪️ Frac-connections
▪️ DAPO
▪️ Reinforcement learning for reasoning in small LLMs
▪️ MetaLadder
▪️ Measuring AI ability to complete long tasks
▪️ Why do multi-agent LLM systems fail?
▪️ Agents play thousands of 3D video games
▪️ GKG-LLM
▪️ Privacy, Synthetic Data, and Security
▪️ Scale-wise distillation of diffusion models
▪️ Multimodal chain-of-thought reasoning
▪️ Survey on evaluation of LLM-based agents
▪️ Stop overthinking: A survey on efficient reasoning
▪️ Aligning multimodal LLM with human preference
🧵 1. Xattention by @MIT, @Tsinghua_Uni, @sjtu1896 and @nvidia
Speeds up inference with block-sparse attention and antidiagonal scoring
There’s no single “right” answer for AI models in creative writing (like creating a story tale), and their open-ended thinking is a key part of creative intelligence.
Still, models often lack output diversity, so @midjourney dropped an interesting study on this 👇
▪️ Their idea is to add diversity directly into the training process:
They measured response deviation for the same prompt and used it to train with DPO and ORPO, leading to more diversified DDPO and DORPO methods.
Here's how DDPO and DORPO work:1. Diversified DPO (DDPO):
In the regular DPO method, the model learns by comparing a better response to a worse one.
In diversified version, researchers add more weight to rare or unique winning responses—those with higher deviation.
This helps the model pay more attention to uncommon but high-quality examples during training.
Mar 18 • 10 tweets • 3 min read
DiLoCo (Distributed Low-Communication) method by @GoogleAI and @GoogleDeepMind changes how training of models happens:
Instead of constant syncing, multiple copies of the model are trained in parallel and sync only occasionally.
Scaling laws show how DiLoCo works as models' size grows🧵
At its core, DiLoCo follows a 2-level optimization process:
• Inner optimization: Each model replica (M) trains independently, making local updates.
• Outer optimization: Every H steps, replicas sync their updates to adjust a global model, which is then shared with all replicas, repeating the cycle.
Here are scaling laws for DiLoCo:
Mar 11 • 8 tweets • 4 min read
The latest AI/ML news of the week:
▪️ @perplexity_ai expands beyond the web
▪️ Manus: a Chinese high-performing AI agent
▪️ @Apple delayed Siri AI enhancements and new M3 Ultra chip
▪️ @CorticalLabs' CL1 computer fuses human brain cells with silicon
▪️ @MistralAI OCR
▪️ Andrew Barto and @RichardSSutton take home the 2024 Turing Award!
Find the details below 🧵 1. @perplexity_ai expands beyond the web
It partners with hardware firms to integrate its AI into everyday devices. This year, Deutsche Telekom’s AI Phone launches with Perplexity’s assistant, hinting at future moves. Phones for now, then TVs? Where next?
Speculative Mixture-of-Experts (s-MoE) makes running MoE-based LLMs faster by reducing the communication overhead between GPUs.
S-MoE uses 2 techniques:
• Speculative Token Reshuffling (s-TS):
Predicts which experts tokens will use, rearranging tokens early to minimize token movement later.
• Speculative Expert Pre-grouping(s-EG):
Groups experts handling similar tokens together in advance to reduce communication.
s-MoE almost doubles performance over DeepSpeed-MoE and SGLang frameworks.
Here are the details:
Problem of MoE models
MoE inference efficiency is limited by Expert Parallelism (EP), as tokens are sent to specific experts located on different GPUs.
So tokes frequently move between GPUs, creating heavy communication overhead and slowing performance.
s-MoE can solve this👇
Mar 8 • 7 tweets • 3 min read
Contrastive Sparse Representation (CSR) by @XDUofChina is an effective alternative to Matryoshka Representation Learning (MRL) for creating embeddings.
MRL can change embedding lengths but needs retraining the entire model and loses accuracy with short embeddings.
CSR solves this problem by using sparse coding: It keeps embeddings longer but activates only a few parts (neurons), making them "sparse."
This makes CSR a simple, fast and accurate method.
Here's how it works:
Working process:
CSR works differently from MRL because it starts with already-trained embeddings, converts them into sparse representations, and then activates only the most important features (TopK).
To ensure embeddings stay accurate and compact, it combines two losses👇
Mar 4 • 12 tweets • 6 min read
The latest AI/ML news of the week:
▪️ @DeepSeek_ai: 6 extraordinary deliveries during #OpenSourceWeek
▪️ @AnthropicAI
- Claude 3.7 Sonnet
- Transparency Hub
- A fresh $3.5B Series E
▪️ @Google
- Gemini Code Assist free for all
- The AI co-scientist
▪️ @awscloud Center for Quantum Computing: Quantum error correction (QEC) scheme
Find the details below 🧵 1. @deepseek_ai delivered 6 major open-source AI optimizations
SWE-RL from @AIatMeta - the first reinforcement learning (RL) method to improve AI for real-world software engineering tasks.
SWE-RL trains models by:
- Studying software evolution data from GitHub pull requests (PRs)
- Using simple rules to reward the model
- Teaching reasoning before coding
It solved 41.0% of issues in SWE-bench Verified - the best ever performance for medium-sized LLMs and improved general reasoning skills.
Here's how it works:1. GitHub pull request (PR) data:
Researchers gathered 11M high-quality GitHub pull request (PR) data (seed dataset), linked it to real issues as training examples. It's structured to include issue descriptions, code context, and correct fixes.
🧵
▪️ Classical AI planning (deliberative planning)
Agents find action sequences to reach a goal, using predefined models (like STRIPS, PDDL) and search algorithms, like depth-first search or A*. In LLM-based systems, classical planning adds structure and reliability.
Feb 27 • 11 tweets • 4 min read
Current LLM-serving systems treat each LLM call separately, causing delays in multi-step programs.
Autellix by @UCBerkeley, @GoogleDeepMind, and @sjtu1896 helps to fix it.
This new system "looks" at entire AI programs and schedules LLM requests based on the overall workflow.
• It makes AI programs run 4-15x faster, reducing waiting time and execution time for them.
• Allows the LLM engine to handle more requests at once.
• Maintains the same response speed.
Autellix achieves this through:
- smart scheduling
- efficient memory management
- better load balancing across multiple AI servers.
Here are the details:1. Autellix improves scheduling in two ways to reduce delays:
• Program-aware prioritization: It tracks program history and prioritizes requests based on total execution time. Shorter programs get processed sooner.
• Preemptive scheduling: If a long request is slowing things down, it can be temporarily paused to allow shorter requests to go through first.
Feb 26 • 6 tweets • 3 min read
MoBA, Mixture of Block Attention, from @Kimi_Moonshot improves handling long-context tasks with no fixed attention patterns.
Applying ideas from Mixture of Experts (MoE) to attention, MoBA lets the model dynamically decide where to focus.
This allows MoBA to be 6.5x faster than full attention for 1M tokens.
Here's how it works:
Working process with everything in order:
- Instead of looking at everything at once, MoBA divides the text into smaller sections blocks.
- It scores block, groups and organizes them, prioritizing the most relevant ones for each task.
- Only the top-scoring blocks are used for attention.
- MoBA ensures that attention is only on past and present words, keeping the process natural and logical.
Feb 25 • 26 tweets • 15 min read
The freshest AI/ML research of the week:
Our top 9
▪️ SigLIP 2
▪️ Intuitive Physics Understanding Emerges from Self-Supervised Pretraining on Natural Videos
▪️ Native Sparse Attention
▪️ OctoTools
▪️ ReLearn
▪️ On the Trustworthiness of Generative Foundation Models
▪️ S* Test Time Scaling for Code Generation
▪️ Autellix (Serving Engine for LLM Agents)
▪️ Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering
▪️ SurveyX
▪️ From RAG to Memory: Non-Parametric Continual Learning for LLMs
▪️ How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
▪️ Train Small, Infer Large
▪️ Eager Updates for Overlapped Communication and Computation in DiLoCo
▪️ S^2R: Teaching LLMs to Self-verify and Self-correct via RL
▪️ Logic-RL
▪️ Discovering Highly Efficient Low-Weight Quantum Error-Correcting Codes with RL
▪️ Armap
▪️ Thinking Preference Optimization
▪️ Rethinking Diverse Human Preference Learning through Principal Component Analysis
▪️ Craw4LLM
▪️ LLMs and Mathematical Reasoning Failures
▪️ Small Models Struggle to Learn from Strong Reasoners
▪️ Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options
- Uses a Transformer architecture with a memory module to improve long-context reasoning.
- Outperforms RMT by 37.1% and excels in multi-hop inference.
▪️ NatureLM:
- Is trained across scientific domains.
- Enhancing tasks like SMILES-to-IUPAC translation and CRISPR RNA design for cross-domain applications.
▪️ Goedel-Prover:
- Advances formal proof generation
- Achieves 57.6% Pass@32 on miniF2F using expert iteration and statement formalizers.
Find the links below👇 1. LM2: Large Memory Models by Convergence Labs Ltd.