Read on Twitter

Valentin Wyart @valentinwyart

, 16 tweets, 8 min read Read on Twitter

@FindlingCharles

@FindlingCharles

Learn more about our new study, “Computational noise in reward-guided learning drives behavioral variability in volatile environments” (goo.gl/BtNAXR), with @FindlingCharles, @vasilisa_skv and @StePalminteri! #tweetstorm #preprint 1/16

When tracking the value of actions in volatile environments, humans make seemingly irrational decisions which fail to maximize expected value. Existing theories attribute these ‘non-greedy’ decisions to information seeking – a.k.a., the exploration-exploitation trade-off. 2/16

@StePalminteri

@StePalminteri

Based on my recent work with Jan Drugowitsch on the origin of behavioral variability in probabilistic reasoning (goo.gl/hqtUf3), @StePalminteri and I reasoned that non-greedy decisions may be caused by computational noise in the learning of action values. 3/16

@FindlingCharles

@FindlingCharles

Using reinforcement learning (RL) models developed by @FindlingCharles and multimodal neurophysiological data analyzed by @vasilisa_skv, we show that the majority of non-greedy decisions stems from learning noise rather than from active information seeking (exploration). 4/16

By contrasting learning conditions in which exploration is either useful or useless, we found large adjustments of the exploration-exploitation trade-off but a constant amount of learning noise predicted by our noisy RL model. 5/16

Then, by measuring the consistency of human decisions across repeated sequences of rewards, we validated that most of learning noise is due to random variance rather than to systematic biases – in other words, a misfit of our RL model. 6/16

Learning noise offers a parsimonious account for decision effects previously assigned to the choice process: choice hysteresis, and choice adaptation to surprise. Both effects fall naturally out of the statistical properties of learning noise, without further assumptions. 7/16

At the neural level, BOLD responses in the dorsal anterior cingulate cortex (dACC) reflect both the mean and variability of learning steps predicted by our RL model. This is over and above other variables (incl. conflict and surprise) known to correlate with dACC activity. 8/16

This is the case even in occasional trials where participants were required to select one of the two reward sources and thus did not choose between reward sources. This confirms that learning noise reflects variability in RL steps rather than variability in choice. 9/16

Brain-behavior analyses supported the positive relationship between dACC activity and learning noise, by showing that trial-to-trial fluctuations in dACC activity predict sensitivity to action values predicted by noise-free RL. Even when exploration was useless by design. 10/16

We suspected that neuromodulatory fluctuations in neural gain driven by the LC-NE system may mediate the relationship between dACC activity and learning noise. To test this hypothesis, we took advantage of the strong correlation between LC-NE activity and pupil dilation. 11/16

We found that, like dACC activity, trial-to-trial fluctuations in pupil dilation reflect learning noise and predict sensitivity to action values. Unlike the dACC, pupil dilation also predicts local adjustments in the exploration-exploitation trade-off when it is useful. 12/16

These findings reveal that most of behavioral variability, rather than reflecting human exploration, is due to the limited precision of reward-guided learning. This is consistent with recent theories and observations of learning-specific variability triggered by the dACC. 13/16

@ali_r_soltani

@ali_r_soltani

The ‘metaplastic’ synapses hypothesized by @ali_r_soltani et al. in the dACC (goo.gl/eDHHKz) produce behavioral variability with the same statistical signatures as learning noise in our RL model. 14/16

@behrenstimb

@behrenstimb

The pooling/sampling of prediction errors based on multiple learning rates reported in dACC activity by Matthew Rushworth, @behrenstimb et al. (goo.gl/peH2W3) would also produce behavioral variability of the same nature. 15/16

@ERC_Research

@ERC_Research

That’s all folks. You’ll find a lot more details and information in the bioRxiv preprint. This work was supported by grants from @ERC_Research and @AgenceRecherche, and by synergistic teamwork with @FindlingCharles, @vasilisa_skv and @StePalminteri. 16/16

Like this thread? Get email updates or save it to PDF!

Subscribe to Valentin Wyart

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Valentin Wyart

This content may be removed anytime!

Try unrolling a thread yourself!

Related hashtags

More from @valentinwyart see all

Related threads

Trending hashtags

Did Thread Reader help you today?