Understanding data quality is crucial for reliable ML. In our #ICML2022 paper, @NabeelSeedat01, @JonathanICrabbe & @MihaelaVDS present a Data-Centric framework for the understudied problem of identifying incongruous examples of in-distribution data.
🧵1/10
TLDR.
*Do you want to know which examples will be reliably predicted, independent of the downstream predictive model?
* Do you want to get insights into your data to understand possible limitations?
If so, Data-SUITE our new #DataCentricAI framework is for you!
2/10
There has been a significant focus on out-of-distribution data (OOD) for reliable ML.
However, in Data-SUITE we tackle an equally important yet understudied problem.
How do we assess In-Distribution data, with feature space heterogeneity?
3/10
Data-SUITE is a paradigm shift from current model-centric methods of uncertainty estimation, which assess predictive uncertainty.
Data-SUITE models uncertainty in the data itself.
i.e Data-Centric.
This allows us to flag instances in a model-independent manner.
4/10
Our new #DataCentricAI framework called Data-SUITE takes a pipeline approach to construct feature-wise confidence interval estimators leveraging:
(1) Copula modeling,
(2) Representation Learning and
(3) Conformal Prediction.
5/10
The feature-wise conformal predictor allows us to produce adaptive intervals that help us flag incongruous instances.
At the same time, with conformal prediction, we get rigorous theoretical guarantees on coverage 🚀⭐️💡
6/10
Data-SUITE's brand of data-centric uncertainty outperforms model-centric counterparts on multiple real-world tabular datasets, with different types of incongruence.
We show utility for 2 practical problems:
1. Reliable model deployment
2. Insightful data exploration
7/10
* Reliable model deployment.
Data-SUITE consistently identifies the most impactful data instances for a diverse class of downstream predictive models.
8/10
* Insightful data exploration
Data-SUITE can help data owners to understand potential data limitations.
9/10
📄 Paper:
arxiv.org/abs/2202.08836
🎬 Video:
Catch our spotlight at ICML tomorrow (Thursday)
icml.cc/virtual/2022/p…
#DataCentricAI #MachineLearning #DataQuality #ReliableML #ai #datascience #ConformalPrediction #ICML2022
10/10
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.