Grigory Sapunov Profile picture
PhD in AI | GDE in AI/ML | CTO Intento | Author "Deep Learning with JAX" 📝 ML insights: https://t.co/ySSOXJKL7H 🤖 Daily AI paper reviews: https://t.co/yQNYyqTbBR
May 29 11 tweets 4 min read
1/
Can we have a natural language conversation with a frozen, non-neural biological system?

Not by rewriting its DNA, but by treating its raw physics as a reinforcement learning agent. Here is how we translate human prompts into biological actions. 🧵 Image 2/
The paper Language Game: Talking to Non-Human Systems by Yanbo Zhang and Michael Levin (@drmichaellevin) bypasses bottom-up micromanagement.

Instead of editing gene networks, they wrap frozen dynamical systems in trained linear interfaces to communicate.
May 23 11 tweets 3 min read
1/
Looped transformers offer extreme parameter efficiency, but their quadratic self-attention kills long-context scalability.

What if you swapped attention for subquadratic mixers?

It turns out looping doesn't just save parameters—it actively multiplies linear-time expressivity. 🧵Image 2/
Introducing LT2: Linear-Time Looped Transformers by Chunyuan Deng, Yizhe Zhang, @eugene_ng, Hanjie Chen, et al.

They replace heavy softmax attention with subquadratic primitives, breaking the KV-cache bottleneck while keeping the reasoning benefits of deep weight recurrence.
Apr 14 11 tweets 3 min read
1/ Forcing LLMs to reason in English tokens is a massive structural bottleneck. Next-gen models won't "think" in text at all. They will reason natively in continuous latent space. 🧵 Image 2/ Yu et al. just dropped The Latent Space, a massive survey formalizing the shift from discrete token decoding to machine-native continuous computation. It maps the architectures making this possible. Image
Mar 21 11 tweets 3 min read
1/ Video models understand motion but hallucinate geometry. Image models nail geometry but are blind to motion. We have accepted this tradeoff for years. Meta FAIR just proved it is purely an architectural bug, not a theoretical limit. 🧵 Image 2/ V-JEPA 2.1 by Mur-Labadia, Muckley, and the FAIR team fixes the global-local representation bottleneck. It unifies image and video representation learning into a single encoder. This is a massive step for embodied AI world models. Image
Mar 5 11 tweets 3 min read
1/ LLMs spontaneously form perfect geometric manifolds: circles for months, spirals for timelines. We usually assume this requires deep, complex learning dynamics. A new paper proves it is actually just basic data statistics forcing the math. 🧵 Image 2/ The paper "Symmetry in language statistics shapes the geometry of model representations" by Karkada et al. solves a major interpretability puzzle. It links the shape of the neural code directly to translation symmetry in the training corpus.
Feb 20 11 tweets 4 min read
1/
Standard scaling laws might be inefficient.

New research demonstrates matching GPT-2/Pythia baselines with 37% fewer parameters or 24% fewer training tokens.

The secret? Stop predicting just the next token. Predict the "Next Concept" first. 🧵 Image 2/
Paper: Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models
Authors: Liu et al. (LUMIA Lab)

The premise: Standard Transformers waste compute managing long-range dependencies at the syntax level. ConceptLM adds a latent planning layer.
Feb 17 10 tweets 3 min read
1/
Transformers don't count like computers. We assume they have hidden "registers" to track variables. We were wrong.

New research by @AnthropicAI reverse-engineered Claude 3.5 Haiku and found it works with 6D helical manifolds.

It's geometry, not math. 🧵 Image 2/
Paper: "When Models Manipulate Manifolds"
Context: How does a model receiving *token IDs* track *character lengths* for line-wrapping?

The tokenizer abstracts characters away. To solve this, the model must reconstruct length, accumulate it, and compare against a limit. Image