How to get URL link on X (Twitter) App
https://twitter.com/percyliang/status/1600383429463355392Past RLHF theory assumes the human and the AI fully observe the environment. With enough data, the correct return function, and thus the optimal policy, can be inferred.