Kareem Carr, Statistics Person Profile picture
Sep 30, 2020 8 tweets 2 min read Read on X
The reason machine learning algorithms show bias is that the goal of these algorithms is to learn ALL the patterns in the data including the biases. The "bias" is actually the gap between what the data scientist THINKS is being learned and what's actually being learned. 🧵
An interesting feature of this bias is it's subjective. It depends on what the data scientist INTENDED to learn from the data. For all we know, the data scientist intended to learn all the patterns in the data, racism and all. In which case, there is no bias.
Generally, machine learning does not require us to be specific about what patterns we are trying to learn. It just vaguely picks up all of them. This means we often have no clue what was learned and if it is what we intended to learn.
Traditional statistics isn't like this. In statistics, the first step is specifying what patterns you want to detect. This requires you to have some kind of theory about the structure of the data. Most importantly, this allows you to check if your theory is wrong.
This issue is an huge weakness of the machine learning approach. The vagueness about what is being learned means that we have to do a lot of work after we fit the model to understand the properties of the model itself. In practice, this work is often not done.
The reason we need to do the work is because we can't rely on theory to tell us what the model learned so we must measure it. This means looking at how the model behaves in order to see if it's racist, sexist or has other biases we might care about.
As we see with the many examples of racist algorithms, many of the people using machine learning mistakenly think that they can rely on their intuitions to guess what kinds of patterns are in their dataset and what kind of patterns their algorithms are learning. This is naive.
I think the solution to racism in algorithms (and other biases of this kind) is to be more hands-on about understanding the processes that created the data your model uses and more proactive and explicit about checking that your models have the properties you think they have. 🧵

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Kareem Carr, Statistics Person

Kareem Carr, Statistics Person Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @kareem_carr

Mar 8
In honor of international women's day, let's take a moment to remember the most famous statistician in history.

You've definitely heard of her, but you probably have no idea she was a statistician.

It's Florence Nightingale. Image
Image
Nightingale was first female member of the Royal Statistical Society and a pioneer in using statistical analysis to guide medical decisions and public health policy.
Florence Nightingale's most famous statistical analysis was her investigation into the mortality rates of soldiers during the Crimean War. She demonstrated that the majority of deaths among soldiers were due to preventable diseases rather than battlefield injuries! Image
Read 5 tweets
Feb 18
Took one for the team and made a histogram of the Elon social security data. Not sure why his data scientists are just giving him raw tables like that. Image
Image
It’s also weird that they keep tweeting out these extremely strong claims without taking a few days to do some basic follow up work.
It doesn’t come off like they even:
- plotted the data
- talked to any of the data collectors
- considered any alternative explanations
Read 6 tweets
Feb 8
Here's my solution to teaching this kid probability 🧵 Image
Let's just take his system of assigning probability at face value. What's the probability of getting a six when I roll a die?

Well either it happens or it doesn't happen. So, the chances of getting a 6 are 50%.
What's the probability of it being a one? Also 50%. What's the probability of it being a two? Also 50%.

That all adds up to 300% across all scenarios. No problem though. There's a solution.
Read 5 tweets
Feb 6
Nate Silver's latest book reads to me like a roadmap of the current moment. It's about a kind of chaotic, aggressive quantitative thinker who's usually wrong, but in calculated ways that lead to massive wins when things break their way. Image
These would include venture capitalists, crypto bros, tech evangelists, AI boosters and even a few influencers. They also seem to be among the most powerful members of MAGA.
Their constant wrongness tempts the rest of society to see them as idiots. That's a mistake. They're often making calculated bets on rare events with massive payoffs.
Read 6 tweets
Jan 23
This is a resource thread about the Datasaurus Dozen data and how to get it.

The Datasaurus Dozen is a collection of extremely different datasets with near identical summary statistics.

It’s a reminder to all of us to ALWAYS plot our data.
Here’s what all the datasets look like: Image
It’s available through R using the following code. Technically, all you need is the library call:

library(“datasauRus”)

and then you can access the datasauruss_dozen variable containing the datasets. The rest is just for plotting. Image
Read 6 tweets
Jan 20
Nassim Taleb has written a devastatingly strong critique of IQ, but since he writes at such a technical level, his most powerful insights are being missed.

Let me explain just one of them. 🧵 Image
Taleb raises an intriguing question: what if IQ isn't measuring intelligence at all, but instead merely detecting the many ways in which things can go wrong with a brain?
Imagine a situation like this, where there's no real difference between having an IQ of 100-160 in terms of real world outcomes, but an IQ of 40-100 suggests something has gone seriously wrong in a person's life: anything from lead poisoning to severe poverty. Image
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(