Assistant Professor of CS @UCLA. Previously: PhD @StanfordAILab, bachelors @IITDelhi. AI, ML, Climate.
Mar 9, 2022 • 8 tweets • 3 min read
How does exploration vs exploitation affect reward estimation? Excited to share our #AISTATS2022 paper that constructs optimal reward estimators by leveraging the demonstrator's behavior en route to optimality.
Exploration vs exploitation is key to any no-regret learners in bandits. We derive an information-theoretic lower bound that applies to any demonstrator, which shows a quantitative tradeoff between exploration and reward estimation. 2/