Paper alert: π
We will be presenting our paper "Conformal Off-Policy Prediction" at #NeurIPS2022. arxiv.org/abs/2206.04405
We present COPP, a novel methodology of quantifying uncertainty in off-policy outcomes. 1/n
Given an untested policy and past observational data, how do you find the most likely outcome(s) under this policy without deploying it in the real-world? We solve this problem for contextual bandits using Conformal Prediction, which comes with strong theoretical guarantees. 2/n
Existing OPE methodologies estimate the *average* reward under the new policy. This does not convey information about the distribution of the reward itself. To the best of our knowledge, COPP is the first work which estimates the uncertainty in the reward itself. 3/n
Additionally, results from COPP are context dependent, i.e., it provides most likely outcome(s) for a given context X, if we were to choose actions according to the new policy. In doing so, COPP provides granular information about the policy performance. 4/n
COPP is the first step towards off-policy assessment using the uncertainty of reward itself, and could lead to interesting possibilities like robust policy optimisation, by optimising the worst case outcomes. π 5/n
I would like to thank my co-authors @jeanfrancois287 (equal contribution), Rob Cornish, @yeewhye, @ArnaudDoucet1, without whom this work would not have been possible. If you're interested in more details, come say hi to us at NeurIPS. 6/n
@jeanfrancois287@yeewhye@ArnaudDoucet1 If you're interested in more details, come say hi to us at NeurIPSπ. Also, check out a high-level summary of our work at .
n=7
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh