Clare Lyle Profile picture
Jul 21 7 tweets 3 min read
At #ICML today: why is generalization so hard in value-based RL? We show that the TD targets used in value-based RL evolve in a structured way, and that this encourages neural networks to ‘memorize’ the value function.
📺 icml.cc/virtual/2022/p…
📜 proceedings.mlr.press/v162/lyle22a.h… Image
TL;DR: reward functions in most benchmark MDPs don’t look much like the actual value function — in particular, the smooth* components of the value function tend to be missing!
*smooth ~= doesn't change much between adjacent states, e.g. a constant function.
Early TD targets tend to resemble the reward, and it can take many updates for reward information to propagate (see attached figure). Meanwhile, the deep RL agent is training its neural network to fit these non-smooth prediction targets, building in a bias towards *memorization*. Image
Later on in training, even if the smooth components of the value function are present in the targets, the network maintains this bias: an update to its prediction for one state exerts little influence on other randomly sampled states from its replay buffer. Image
Randomly initializing a new network and distilling it on the trained agent’s value function mitigates this bias, suggesting that it is a result of training on TD targets. Networks trained only with policy gradient losses also extrapolate more between states.
Overall, it’s not clear whether this bias towards memorization is necessarily bad, as it might help stabilize learning; however, it does clearly reduce the extent to which an agent can generalize its learned policy to states it hasn’t seen yet.
Want to learn more? Visit our poster today to find out more at Hall E #1018. Thanks to co-authors Mark Rowland (presenting the poster), along with @wwdabney, @yaringal, and Marta Kwiatkowska.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Clare Lyle

Clare Lyle Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(