The core idea of Quixote was to teach reinforcement learning systems to follow social behavioral conventions when performing tasks. We introduced “Learning from Stories” as an alternative to “Learning from Demonstrations”.
The work roughly falls into the class of AI alignment.
Quixote had two learning systems. First, it learned a high-level graph, a partial plan (with branches and alternatives), from a crowdsourced corpus of natural language stories.
It learned what the “actions were” and the likely orderings...
The 2nd learning system grounded the high-level actions in the controls of an autonomous (virtual) robot so that it could then learn a RL policy that filled in the missing details and was executable in the environment. B/c natural language stories are not directly executable.
Quixote was highly successful but also really hard to publish. I’m glad it has gotten some recognition.
It launched a number of related research efforts...
I would normally never send anyone to Lesswrong. com, but someone posted about Sam Altman remarks about OpenAI’s plans for GPT-4, and I have thoughts lesswrong.com/posts/aihztgJr… 1/7
GPT-4 will focus on coding (ala Codex). It will not be much bigger than GPT-3. The focus will instead be "line of sight" planning. Which is not really planning, it just means bigger context windows and output windows. 2/7
A long enough output window looks like lookahead, but is still just applying historical patterns to new problems. Dare I say "case based reasoning"? 3/7
In about 3 weeks universities will be in session again. Many universities (like my own) want to pretend that things will be back to normal. The buildings and classrooms and quads will all be there and look the same. The routines of commuting to classes will be the same… 1/7
But WE will not be the same. We may still be suffering from mental fatigue. We may have developed new life routines and work habits that are suddenly incompatible with on-campus life. 2/7
2nd year students will be expected to act like 2nd year students even though they are navigating the new social norms of being away from home for the first time—something students normally learn in their first years. 3/7
A new way to OBJECTIVELY measure the coherence of story generation systems. Grounded in narratology and validated in controlled studies arxiv.org/abs/2104.07472
Me: that fails to converge until the programmer cleans up millions of lines of labeled data? Right! Very suspenseful! Never know if that’s going to work.
Hollywood: you’re fired
Hollywood: the AI has a robot body that—
Me: can’t pick up any objects unless they a placed in a very specific way on a table at just the right height. The struggle is real.