Nan Jiang Profile picture
machine learning researcher, with focus on reinforcement learning. asst prof @ uiuc cs. Course on RL theory (w/ videos): https://t.co/cEjTizHdnB
Jun 4, 2019 6 tweets 2 min read
The entire RL theory is built on objects like V^π, Q*, π*, T (Bellman up. op.), etc... until you realize that this foundation is quite shaky. arxiv.org/abs/1905.13341 Spoiler: no big deal (yet) but thinking thru this is super useful for resolving some confusions. (1/x) Example: I recently heard an argument that residual minimization is not good (and TD is better) because Bellman err is unlearnable (see the new ed of RL textbook; I tweeted about this b4). If u r a believer of this, my paper has a (rather implicit) refutation. (2/x)