This project has been a nice and long effort, but I’m excited to share a new paper: **When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data**

Work done with @mohitban47

Arxiv: arxiv.org/abs/2102.02201
Thread below 1/n
There are datasets where people explain why data point x gets label y, and the explanations look very helpful for solving the task. But what if models already know the relevant facts or can infer what they need to from a task input alone?
To test this question, we first design a synthetic task where we vary the num. of distinct hidden tasks in the data (we also test with existing datasets later). Our “explanations” of each point reveal what hidden task it belongs to & provide helpful info for predicting its label
We find that when there are a small number of tasks, models do well (98% acc), but performance quickly falls off as the number of tasks increases. If we could condition on explanations here, performance would jump back to 98%.
But we shouldn’t expect to have explanations at test time -- that’s new unexplained data we need to predict. So we learn to retrieve past explanations from training data, & we can solve the task again (98% acc). Explanation retrieval even beats a baseline with 10x training data.
We argue that this approach is suitable for explanation data since large LMs can extract relevant facts from inputs or infer task representations from them. We contrast with other graphical models in the paper.
We investigate several other questions, including whether models can combine info across explanations (yes!), as well as whether explanations can help point models toward using strong/causal features rather than weak/spurious ones (depends on their exact correlation, actually!)
After investigating 7 total research questions with our synthetic task, we try this model on 3 existing datasets. Accuracy usually goes up a little bit across conditions, but we do not find any stat-sig improvements. This applies for several training set sizes.
Ultimately, we claim that for explanations to be helpful, the model must be able to better infer relevant latent info given the explanation and input together, relative to using the input alone. We also identify 5 conditions for learning retrieval to be feasible in practice. n/n

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Peter Hase

Peter Hase Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!