Peter Hase Profile picture
PhD student at @uncnlp, advised by Mohit Bansal. Interested in interpretable ML, natural language processing, AI Safety, and Effective Altruism.
Apr 9, 2021 4 tweets 2 min read
Interested in interpretable and explainable machine learning? Check out our new blog post with opinions on the field and 70 summaries of recent papers, by @__Owen___ and me!

Link: alignmentforum.org/posts/GEPX7jgL… Topics include Theory, Evaluation, Feature Importance, Interpreting Representations, Generating Counterfactuals, Finding Influential Data, Natural Language Explanations, Adversarial/Robust Explanations, Unit Testing, Explaining RL Agents, and others (note: not a formal taxonomy)
Feb 4, 2021 9 tweets 3 min read
This project has been a nice and long effort, but I’m excited to share a new paper: **When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data**

Work done with @mohitban47

Arxiv: arxiv.org/abs/2102.02201
Thread below 1/n There are datasets where people explain why data point x gets label y, and the explanations look very helpful for solving the task. But what if models already know the relevant facts or can infer what they need to from a task input alone?