How can agents infer what people want from what they say?
In our new paper at #acl2022nlp w/ @dan_fried, Dan Klein, and @ancadianadragan, we learn preferences from language by reasoning about how people communicate in context.
@dan_fried@ancadianadragan We’d like AI agents that not only follow our instructions (“book this flight”), but learn to generalize to what to do in new contexts (know what flights I prefer from our past interactions and book on my behalf) — i.e., learn *rewards* from language. [2/n]
@dan_fried@ancadianadragan The challenge is that language only reveals partial, context-dependent information about our goals and preferences (when I tell a flight booking agent I want “the jetblue flight,” I don’t mean I always want a jetblue flight — just in this particular case!). [3/n]
@dan_fried@ancadianadragan On the other hand, we have a lot of techniques in inverse reinforcement learning to go from actions -> underlying rewards, but these methods will miss the fact that language naturally communicates *why* people want those actions. [4/n]
@dan_fried@ancadianadragan To study this, we collect a dataset of natural language in a new task, FlightPref, where one player (the “assistant”) has to infer the preferences of the “user" while they book flights together.
Lots of rich, interesting phenomena in the data (to be released!): [5/n]
@dan_fried@ancadianadragan We build a pragmatic model that reasons that language communicates what agents should do, and the *way* people describe what to do reveal the features they care about. Both enable agents to make more accurate inferences. [6/n]
@dan_fried@ancadianadragan There’s many directions to take FlightPref / reward learning from language further: building agents that learn to ask questions based on uncertainty, studying adaptation to different humans, and seeing how these ideas extend to inferring real preferences in the wild! [7/n]
@dan_fried@ancadianadragan More broadly, it’s an exciting time to be working on language + action/RL! A lot of work has been focused on e.g. language for generalization, but our work hints at how language humans use to *communicate* present distinct challenges for grounded agents (pragmatics, etc.!). [8/8]
• • •
Missing some Tweet in this thread? You can try to
force a refresh