How does exploration vs exploitation affect reward estimation? Excited to share our #AISTATS2022 paper that constructs optimal reward estimators by leveraging the demonstrator's behavior en route to optimality.

🧵: Image
Exploration vs exploitation is key to any no-regret learners in bandits. We derive an information-theoretic lower bound that applies to any demonstrator, which shows a quantitative tradeoff between exploration and reward estimation. 2/
Such a tradeoff immediately implies that estimation is impossible in the absence of exploration —e.g. assuming the execution of only an optimal policy— which is precisely the well-known identifiability issue in inverse RL. 3/
Can we construct optimal estimators that achieve this lower bound? We show that the answer is YES to two popular families of learning algorithms: successive arm eliminations (SAE), which is non-adaptive; and upper confidence bound (UCB), which is adaptive. 4/
Our estimators are simple and directly based on the sequence of actions performed by the learner. These show that for either type of demonstrator, exploration can be optimally leveraged in reward estimation, even though the exploration schedule takes different forms. 5/ Image
We test these estimators extensively on synthetic simulations as well as simulators for science domains (eg, battery charging and gene expression) to confirm these results: more observations & exploration → better reward estimation. 6/
Full details: arxiv.org/abs/2106.14866
Code: github.com/wenshuoguo/inv…
Wonderful collaboration rooted at Simons Institute with Wenshuo Guo, Kumar Krishna Agrawal, Vidya Muthukumar, Ashwin Pananjady! 7/7

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Aditya Grover

Aditya Grover Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(