Timo Flesch Profile picture

Mar 23, 2022, 17 tweets

1/ Preprint alert: Humans can learn continuously, while standard deep neural networks need interleaved data. Here, @dvgnagy, @SaxeLab @summerfieldlab and I propose a neural network model of human continual learning. #continuallearnig, #cata strophicinterference,#DeepLearning

2/ In 2018, we showed that humans perform better after blocked compared to interleaved training on multiple categorisation tasks pnas.org/doi/epdf/10.10….

3/ Back then, participants classified trees from a bivariate stimulus space according to a single dimension per task.

4/ They performed worse after interleaved compared to blocked training. We argued that interleaved training biases participants towards a single category boundary (“linear” solution), while blocked training promotes a “factorised” solution

5/ Neural networks can’t do this: They suffer from catastrophic forgetting under blocked curricula, but not after interleaved training. Bad news if you want to use them as models of human learning!

6/ Inspired by work on #cognitivecontrol, we introduce two algorithmic motifs to model human continual learning.

7/ We rarely switch contexts in everyday life (for example when leaving the office to go home). The brain should capitalise on this. We introduce “sluggish” task signals during training, where information is maintained over several trials.

8/ Under interleaved training, sluggishness controls task accuracy, whether a factorised or linear solution is learned, and how many hidden layer units are allocated to each task.

9/ With interleaved data, networks learn to allocate different tasks to different task-specific units, forming orthogonal representations. In cell.com/neuron/fulltex… we showed similar results in the brain after blocked training.

10/ Here, we replicate this, and demonstrate that a standard neural network ignores the task signal and only represents the most recently learned task under blocked training (MDS on hidden layer):

11/ However, if we hand-craft this gating, the network no longer suffers from catastrophic forgetting under blocked training:

12/ ... but we don’t have to do this by hand: Our 2nd proposal is a Hebbian learning step, applied to the task-signalling weights, which is alternated with standard SGD and strengthens connections between task-units and relevant hidden units

13/ This Hebbian step is sufficient to protect against forgetting, and enables the network to learn a “factorised” representation under blocked training:

14/ How does it compare to human performance: We reanalysed our data from Flesch et al 2018) and compared it to a baseline model and our sluggish Hebbian model (+decision noise in the output). Our network recreated the performance differences:

15/ Decomposing this into different sources of errors revealed that our model, like human participants, tended towards a linear solution under interleaved training:

16/ We first presented the results at #Cosyne2021, time flies! We also acknowledge other awesome work that has used gating for continual learning, and variants of sluggishness to model switch costs (Masse et al, 2018; Russin et al, 2022).

17/ Thanks for reading the tweeprint! Here’s a link to the paper, feedback most welcome!
arxiv.org/abs/2203.11560

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Keep scrolling