Thread by @che_shr_cat on Thread Reader App

1/
Backprop is the engine of deep learning, but neuroscientists have insisted for decades that the brain can't do it. There are no dedicated "error" neurons or backward wiring.

What if the brain doesn't compute error in space, but in time? 🧵

2/
In "This is how the Neocortex Learns," Randall C. O'Reilly presents a unified theory showing how the mammalian brain approximates backpropagation.

It uses temporal differences across a 200 ms cycle to bypass the need for explicit error-representing cells.

3/
The math relies on an implicit error state:
Error ≈ Activation(plus) - Activation(minus)

Instead of separate error neurons, the same cortical cells represent predictions (minus phase) and outcomes (plus phase) at different moments, driven by bidirectional pathways.

4/
This temporal phasing is coordinated by the corticothalamic loop over a 200 ms theta cycle.

Phase 1 (100 ms): Top-down layer 6 predictions settle.
Phase 2 (100 ms): Strong, focal layer 5b driver inputs override predictions with the actual sensory outcome.

5/
How does a physical synapse compute this?

Through a competitive, double-kinase pathway (CaMKII vs DAPK1) that integrates post-synaptic calcium.

If calcium influx changes rapidly (positive temporal derivative), CaMKII dominates, driving LTP.

6/
Recent in vitro tests support this over classical Hebbian learning.

A flat 50-50 Hz stimulation profile yields zero net plasticity. But a 25-50 Hz transition triggers robust LTP.

The synapse computes the derivative of activity, not just raw co-activity.

7/
The bottlenecks?

We still don't fully map the exact driving targets for deep layer 5 cortical output neurons.

More importantly, while this runs in WebGPU-based spiking networks, we haven't seen it scale to massive, modern deep learning benchmarks yet.

8/
For neuromorphic hardware, this is a goldmine.

It offers a mathematically rigorous, local learning rule that completely eliminates the memory-heavy global backward pass.

We can build ultra-low-power, on-chip continuous learning systems using physical silicon.

9/
I think this work bridges the gap between biological plausibility and deep learning performance. It proves gradient descent isn't just an artificial trick—it's likely how the brain actually optimizes its representations.

10/
Read my full breakdown of O'Reilly's paper:
arxiviq.substack.com/p/this-is-how-…

Original paper here:
arxiv.org/abs/2606.08720

How do you think biological learning rules will impact future AI hardware? Let's discuss below.

11/
Visual summary of the corticothalamic temporal loop mechanism:

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll