Thread by @fchollet on Thread Reader App

Saying that bias in AI applications is "just because of the datasets" is like saying the 2008 crisis was "just because of subprime mortgages".

Technically, it's true. But it's singling out the last link in the causality chain while ignoring the entire system around it.

Scenario: you've shipped an automated image editing feature, and your users are reporting that it treats faces very differently based on skin color. What went wrong? The dataset?

1. Why was the dataset biased in the 1st place? Bias in your product? At data collection/labeling?

2. If you dataset was biased, why did you end up using it as-is? What are your processes to screen for data bias and correct it? What biases are you watching out for?

3. In the event that you end up training a model on a biased dataset: will QA catch the model's biases before it makes it into production? Does your QA process even take ML bias into account?

These are not data problems -- these are organizational and cultural problems. The fact that a biased dataset caused an issue is actually the outcome of the entire system.

Team diversity will help with these things, organically, but having formal processes is necessary by now.

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll