, 21 tweets, 7 min read Read on Twitter
NEW AND IMPROVED!!!

Happy to finally share the published version of our paper on hierarchical learning and confidence, out now in @PLOSCompBiol

journals.plos.org/ploscompbiol/a…

TL;DR? Let me unpack it in a bunch of tweet-sized bites.

0/n
We were interested in hierarchical models of learning in changing environments.

These are models that not only (at the 1st level) track environmental statistics, but also (at a higher level) monitor changes in those statistics, and adapt their learning accordingly

1/n
These models have become wildly popular over the last few years.

You may know them from beautiful papers, e.g. by @beckyneuro or by Powers (both with Mathys) where the authors link specific psychiatric traits to specific parameters in such a hierarchical learning model.

2/n
But hierarchical models are not the only game in town!

Non-hierarchical, (flat) models are computationally cheap and can learn very effectively, even in changing environments.

So we ask: how can we test that learners actually use a hierarchical model?
Importantly, we want to rely not just on model fitting, but also test a unique signature or hallmark — a learning pattern that can *only* be explained by a hierarchical model

Why is that important? See this nice paper by @StePalminteri et al.
doi.org/10.1016/j.tics…

4/n
Earlier papers (since Behrens et al '07) had already proposed a hallmark: that learners adjust their apparent learning rate to environmental volatility.

However, we show a counter-example: a flat model that shows exactly such adjustments, without tracking volatility/changes!
5/
So how to find a truly unique hallmark?

We reasoned: in earlier studies, learners track just 1 statistic. As such, the graph is a linear chain & hierarchy is confounded with volatility

This creates few opportunities for the type of learning that makes hierarchical models unique
We used a more complex task in which *two* regularities are learned, that undergo *global* changes

In such a task, only a hierarchical model can generalise between the statistics

This type of generalisation, or “sharing statistical strength”, is what we set out to measure

7/n
Learners observed long sequences of two stimuli (A & B), the occurrence of which was governed by two probabilities: p(A|A) and p(B|B) — or the probabilities of repetition of A and B.

The values of these probs were independent, but occasionally, both simultaneously changed

8/n
Every few trials, participants report their estimate of p(A|A) or p(B|B), and their confidence in this estimate

Participants can do this very well: both probability reports and confidence aligned closely with an optimal (hierarchical) model

9/n
… but it also aligned well (just *slightly* less well) with a non-hierarchical, flat model.

So how can we test more definitively if learners used a hierarchical model?

10/n
To do this, we target specific, critical trials after long repetitions — or ‘streaks’.

We distinguish between two types

First, ’suspicious streaks’ are long repetitions that seemed unlikely in context and might imply a change.

11/n
After a suspicious streak, a hierarchical model should revise not only the estimate of the observed regularity - here, p(A|A) - but also of the other regularity - here, P(B|B), displayed in fig

... despite having received no direct evidence about this probability

12/n
By contrast, the flat model (that doesn’t monitor change points) should not revise its estimate about P(B|B) at all

So **only** in a hierarchal model, we see a global decrease in confidence (a reset) after suspicious streaks

13/n
For comparison, we also target unsuspicious streaks, where a long repetition is not unlikely in context, and there should be no difference between the models: both models predict little change to P(B|B).
Each subject’s session was carefully constructed to contain equal number of both types of streaks.

.. but participants didn’t know: they were just informed to track the regularities, and thought everything was random.
Surely enough, participants spontaneously showed the effect that *only* a hierarchical learner would show:

Confidence (for the unseen probability) decreased strongly after suspicious, but not non suspicious streaks

(and this interaction was significant)
This work demonstrates that learners automatically resort to hierarchical computations—even when a flat computations would be much cheaper and just as effective

This dovetails nicely with ideas that hierarchical probalistic inference is central and comes naturally to the brain..
.. and supports the hierarchical learning framework which is currently so popular in neuroscience.

\fin
Let me just thank my supervisor, Florent Meyniel, from whom I learned so much during my year in Paris.

And shout out to Florent and @maheump who made most of the ideal observer models we used, for an earlier paper, found here:

journals.plos.org/ploscompbiol/a…
(There's much more in the paper, but this is the key result, and this thread is already so long that I lost track of the count 🙃)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Micha Heilbron
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!