Latest Twitter Threads by @nanjiang_cs on Thread Reader App

Jun 4, 2019 • 6 tweets • 2 min read

The entire RL theory is built on objects like V^π, Q*, π*, T (Bellman up. op.), etc... until you realize that this foundation is quite shaky. arxiv.org/abs/1905.13341 Spoiler: no big deal (yet) but thinking thru this is super useful for resolving some confusions. (1/x) Example: I recently heard an argument that residual minimization is not good (and TD is better) because Bellman err is unlearnable (see the new ed of RL textbook; I tweeted about this b4). If u r a believer of this, my paper has a (rather implicit) refutation. (2/x)

Share this page!

Enter URL or ID to Unroll