arxiv.org/abs/1803.04585
Essential for understanding a class of failures for machine learning systems
Goodhart's law: "when a measure becomes a target, it ceases to be a good measure."
arxiv.org/abs/1803.03453
Multiple examples of evo algos "hacking" their simulators or finding ways of maximising proxies that are not useful for the designers.
arxiv.org/abs/1706.03741
page 9: "poor performance of offline reward predictor training"
When proxy (predicted reward) is not updated online, it fails to teach the desired behaviour.
arxiv.org/abs/1803.10122 @hardmaru
Section 4.5: Cheating When training in imagination, agent steers towards a buggy region of simulation where it's easy to reap rewards. Proxy (imagined consequences) doesn't match reality.