, 6 tweets, 2 min read Read on Twitter
The entire RL theory is built on objects like V^π, Q*, π*, T (Bellman up. op.), etc... until you realize that this foundation is quite shaky. arxiv.org/abs/1905.13341 Spoiler: no big deal (yet) but thinking thru this is super useful for resolving some confusions. (1/x)
Example: I recently heard an argument that residual minimization is not good (and TD is better) because Bellman err is unlearnable (see the new ed of RL textbook; I tweeted about this b4). If u r a believer of this, my paper has a (rather implicit) refutation. (2/x)
Overall it took me 1+ yrs to fully understand what's going on behind the paradoxes. The turning point was when I met Erik Talvitie and he claimed that all RL env are deterministic---and he is totally right!!! Ever heard of PEGASUS? (3/x)
Then I started to wonder why the annoying variance term in the empirical estimate of Bellman error wouldn't just go away automatically... and I realized that there is something VERY wrong about how I (and perhaps u as well) reason about func approx. (4/x)
Pretty funny that a few weeks ago I was telling my CS598 students this: "Remember 'state if sufficient stat. of history' and you are immune to any confusion related to partial observability and function approximation." Apparently this is not true 😝 (5/x)
In some cases we probably need to ask whether individual states are physically meaningful at all. This totally shocked the basic understanding of RL since I was a grad student. If u had similar confusions, read the paper and let me know what u think (like, at ICML)! (6/end)
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Nan Jiang
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!