The reason machine learning algorithms show bias is that the goal of these algorithms is to learn ALL the patterns in the data including the biases. The "bias" is actually the gap between what the data scientist THINKS is being learned and what's actually being learned. 🧵
An interesting feature of this bias is it's subjective. It depends on what the data scientist INTENDED to learn from the data. For all we know, the data scientist intended to learn all the patterns in the data, racism and all. In which case, there is no bias.
Generally, machine learning does not require us to be specific about what patterns we are trying to learn. It just vaguely picks up all of them. This means we often have no clue what was learned and if it is what we intended to learn.
Traditional statistics isn't like this. In statistics, the first step is specifying what patterns you want to detect. This requires you to have some kind of theory about the structure of the data. Most importantly, this allows you to check if your theory is wrong.
This issue is an huge weakness of the machine learning approach. The vagueness about what is being learned means that we have to do a lot of work after we fit the model to understand the properties of the model itself. In practice, this work is often not done.
The reason we need to do the work is because we can't rely on theory to tell us what the model learned so we must measure it. This means looking at how the model behaves in order to see if it's racist, sexist or has other biases we might care about.
As we see with the many examples of racist algorithms, many of the people using machine learning mistakenly think that they can rely on their intuitions to guess what kinds of patterns are in their dataset and what kind of patterns their algorithms are learning. This is naive.
I think the solution to racism in algorithms (and other biases of this kind) is to be more hands-on about understanding the processes that created the data your model uses and more proactive and explicit about checking that your models have the properties you think they have. 🧵

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with 🔥Kareem Carr | Statistician 🔥

🔥Kareem Carr | Statistician 🔥 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @kareem_carr

May 10
I keep seeing this Huberman clip all over my timeline so let’s use it as teachable moment to learn some statistics.
The basic mistake is not taking the people who are already pregnant out of the pool of people who could be pregnant the next month. Of the starting 100, fewer and fewer will remain each month. Image
It’s a little tedious to keep track of what number of people aren’t yet pregnant on each round, and then take 20% of that, and then add up all the pregnant people in each round.
Read 12 tweets
Apr 3
are you always busy but never seem to get enough done? i recently learned a very important lesson about focus:
it's extremely powerful when all your projects fall under one overarching goal such that they feed into and enhance each other.
i think this is why being the weird nerd who only cares about exactly one thing can be so powerful
Read 9 tweets
Mar 26
Racist propaganda in the form of statistical charts is extremely effective because most people are unaware of the decisions being made behind the scenes. Image
What is typically presented to us as “the data” has often been meticulously pre-chewed for our easy consumption.
Every statistic we’ve ever seen comes at the end of a long and complex chain of decision making.
Read 6 tweets
Mar 24
This plot earned a "Wow" from Elon so I guess we need to talk about it. Here's a thread of basic mistakes I see over and over again in these terrible race science plots: Image
1. Inference using crude or unadjusted rates

In a science experiment, we want to compare two groups that are identical in every single way except for the factor we care about.
If you want to say race is the cause of something then you need for the people you are comparing to be alike in every way except race.
Read 12 tweets
Mar 23
I honestly think 99% of US politics can be reduced to this one projection by Pew Research. The desperate attempts to force women to have more kids, the terrified rants about immigration, the absolute panic about being “replaced”. Image
The part of the great replacement theory that’s true is the white population is decreasing as a percentage of the total US population, but it’s a lie that this is somehow brown people’s fault.
If non-white people stopped having babies, the US population would be in complete decline and then we’d be really screwed. Image
Read 7 tweets
Mar 21
We often want to know "What are the chances my hypothesis is true given my data?" The answer depends on the baseline plausibility of different kinds of hypotheses which is usually unknowable.

Frequentist and Bayesian statistics are two responses to this lack of information. Image
In Frequentist statistics, the response is to accept the fundamental lack of information and try our best to proceed without it.
Instead of thinking about the probability of our pet hypothesis given the data, we settle for thinking about the probability of the data given different hypotheses. Sometimes we're lucky and this is good enough.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(