This project has been a nice and long effort, but I’m excited to share a new paper: **When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data**
There are datasets where people explain why data point x gets label y, and the explanations look very helpful for solving the task. But what if models already know the relevant facts or can infer what they need to from a task input alone?
To test this question, we first design a synthetic task where we vary the num. of distinct hidden tasks in the data (we also test with existing datasets later). Our “explanations” of each point reveal what hidden task it belongs to & provide helpful info for predicting its label
We find that when there are a small number of tasks, models do well (98% acc), but performance quickly falls off as the number of tasks increases. If we could condition on explanations here, performance would jump back to 98%.
But we shouldn’t expect to have explanations at test time -- that’s new unexplained data we need to predict. So we learn to retrieve past explanations from training data, & we can solve the task again (98% acc). Explanation retrieval even beats a baseline with 10x training data.
We argue that this approach is suitable for explanation data since large LMs can extract relevant facts from inputs or infer task representations from them. We contrast with other graphical models in the paper.
We investigate several other questions, including whether models can combine info across explanations (yes!), as well as whether explanations can help point models toward using strong/causal features rather than weak/spurious ones (depends on their exact correlation, actually!)
After investigating 7 total research questions with our synthetic task, we try this model on 3 existing datasets. Accuracy usually goes up a little bit across conditions, but we do not find any stat-sig improvements. This applies for several training set sizes.
Ultimately, we claim that for explanations to be helpful, the model must be able to better infer relevant latent info given the explanation and input together, relative to using the input alone. We also identify 5 conditions for learning retrieval to be feasible in practice. n/n
• • •
Missing some Tweet in this thread? You can try to
force a refresh