Nabeel Seedat Profile picture
Jul 20, 2022 10 tweets 7 min read Read on X
Understanding data quality is crucial for reliable ML. In our #ICML2022 paper, @NabeelSeedat01, @JonathanICrabbe & @MihaelaVDS present a Data-Centric framework for the understudied problem of identifying incongruous examples of in-distribution data.

🧵1/10
TLDR.
*Do you want to know which examples will be reliably predicted, independent of the downstream predictive model?

* Do you want to get insights into your data to understand possible limitations?

If so, Data-SUITE our new #DataCentricAI framework is for you!

2/10
There has been a significant focus on out-of-distribution data (OOD) for reliable ML.

However, in Data-SUITE we tackle an equally important yet understudied problem.

How do we assess In-Distribution data, with feature space heterogeneity?

3/10
Data-SUITE is a paradigm shift from current model-centric methods of uncertainty estimation, which assess predictive uncertainty.

Data-SUITE models uncertainty in the data itself.
i.e Data-Centric.

This allows us to flag instances in a model-independent manner.

4/10
Our new #DataCentricAI framework called Data-SUITE takes a pipeline approach to construct feature-wise confidence interval estimators leveraging:

(1) Copula modeling,
(2) Representation Learning and
(3) Conformal Prediction.

5/10
The feature-wise conformal predictor allows us to produce adaptive intervals that help us flag incongruous instances.

At the same time, with conformal prediction, we get rigorous theoretical guarantees on coverage 🚀⭐️💡

6/10
Data-SUITE's brand of data-centric uncertainty outperforms model-centric counterparts on multiple real-world tabular datasets, with different types of incongruence.

We show utility for 2 practical problems:
1. Reliable model deployment
2. Insightful data exploration

7/10
* Reliable model deployment.

Data-SUITE consistently identifies the most impactful data instances for a diverse class of downstream predictive models.

8/10
* Insightful data exploration

Data-SUITE can help data owners to understand potential data limitations.

9/10

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Nabeel Seedat

Nabeel Seedat Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(