Interested in interpretable and explainable machine learning? Check out our new blog post with opinions on the field and 70 summaries of recent papers, by @__Owen___ and me!
Topics include Theory, Evaluation, Feature Importance, Interpreting Representations, Generating Counterfactuals, Finding Influential Data, Natural Language Explanations, Adversarial/Robust Explanations, Unit Testing, Explaining RL Agents, and others (note: not a formal taxonomy)
We're excited to highlight the wide array of research in interpretability/transparency/explainability. We hope this work can help others identify common threads across research areas and get up to speed on the latest work in different subareas.
Please feel free to leave any comments here :) and thanks to @_robertkirk and @mohitban47 for helpful feedback on the post!
This project has been a nice and long effort, but I’m excited to share a new paper: **When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data**
There are datasets where people explain why data point x gets label y, and the explanations look very helpful for solving the task. But what if models already know the relevant facts or can infer what they need to from a task input alone?
To test this question, we first design a synthetic task where we vary the num. of distinct hidden tasks in the data (we also test with existing datasets later). Our “explanations” of each point reveal what hidden task it belongs to & provide helpful info for predicting its label