Want to know what kinds of bias are fixable with statistics and how?
Read on... π§΅π
This is a simple mental map of how different biases affect the process of using algorithms to make changes to the physical world. The way we can fix each bias is as follows...
- Data selection bias: you need an accurate mathematical model of the data creation process
- Statistical bias: you need good statistics
- Bias due to generalization: you need an accurate mathematical model of the observations in the data and in the target population 2/7
To fix the "bias due to causal assumptions", we need to fix all 3 smaller biases. At that point, if your model fits the data well then it should be a very close match to the world. In this case, correlation IS causation and we can say the inputs CAUSE the outputs. 3/7
Because we understand causation in the model, we can investigate the degree to which different variables cause different outcomes and this opens the door to the theoretical investigation of the causal effects of gender and race on the model outcomes. 4/7
We can ask how much they affect the behavior of the model and we can even modify the model to factor in the use of race and gender to the exact degree that we deem them necessary. There is one *big* caveat to this... 5/7
To use gender and race in a mathematical model, we have to quantify them. That is to say we have to capture these traits using numbers. Unfortunately, this process is complicated by the fact that quantification is partially subjective. (This requires a thread of its own!) 6/7
There are limits to what statistics can do. To use statistics, we must first translate reality into numbers. After applying our algorithms, we must then translate the numbers back. Unfortunately, statistics can only say so much about what's been lost in translation. 7/7
I hope you found this thread informative. If you would like to support this kind of content, follow me and also like and retweet the thread. π
β οΈ In the thread, I use the term "statistics" to cover both "causal inference" (the study of mathematical causation) and "statistics". I think these fields are so co-dependent that they're basically the same field and will eventually merge in a few decades but others disagree! π
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
As a black man, I'm concerned about the tendency for algorithms to exhibit what looks like racial bias. As a statistician, I'm naturally drawn to investigate why this happens But what is "bias"? Surprisingly, the answer depends on what you think it means to be "rational". 1/7
We can think of bias as a type of irrational behavior. So broadly speaking, there are two ways one could define bias in algorithms and this arises from the two major definitions of rationality. These are epistemic rationality and instrumental rationality. 2/7
Epistemic rationality is defined as the part of rationality which involves achieving accurate beliefs about the world. Instrumental rationality is the art of choosing and implementing actions that steer the future toward outcomes that you want. 3/7
Jon (@jonst0kes) wrote a thoughtful article about this weekend's events. I don't think he's a fan of "woke" politics but he's pretty good about not making his views the main focus of the piece. "On Saturday, March 27, Kareem Carr stepped on a...landmine" doxa.substack.com/p/understandinβ¦
I don't know what I think of John's sociological analysis but I also don't have a better explanation for why people who I've been friendly with and supportive of for most of my time on Twitter suddenly turned on me. I don't think it's because I was "wrong" because I wasn't.
John argues that I was attacked because I'm proposing a solutions-oriented approach. I can definitely find tweets where my critics were saying one of the "dangerous" myths I was promoting was that there were fixes for bias in algorithms.
FOUR things to know about race and gender bias in algorithms:
1. The bias starts in the data
2. The algorithms don't create the bias but they do transmit it
3. There are a huge number of other biases. Race and gender bias are just the most obvious
4. It's fixable! π§΅π
By race and gender bias in algorithms, I mean the tendency for heavily data-driven AI algorithms to do things like reproduce negative stereotypes about women and people of color and to center white male subjects as normal or baseline. 2/9
While race and gender bias in algorithms *is fixable*, the current fixes aren't easy. They require us to understand and then mathematically model the processes that generate the biases in the data in the first place. 3/9
Many of the biggest tech trends in data analysis can be seen as increasingly sophisticated answers to the question, "How do we monetize data?" π§΅π
The first answer to this question was the buzzword BIG DATA. People thought all you needed was a lot of data, didn't matter what kind, and it would basically monetize itself. Unfortunately, this was incorrect. So the next question became "How do we monetize lots of data?" 2/9
The answer to this question turned out to be the next buzzword. DATA SCIENCE. At this point, people still thought data was inherently easy to monetize so they figured anybody could do it. This turned out to be wrong as well. So the new question became... 3/9
Someone on Twitter just shared this very interesting essay. "Does A=A? I'm not so sure" by James Lindsay
It's a postmodernish musing on the truth of arithmetic statements! π I read it so you don't have to.
It disappeared while I was reading so this tweet is now the only copy!
I know it sounds like I'm making this up but this essay is gone like it never existed! The only reference I could find to the page on the internet is this comment on goodreads. goodreads.com/author_blog_poβ¦
It may (or may not) surprise you to know that this man, James Lindsay, has mocked me mercilessly with all kinds of mean-spirited memes and sneering tweets for my philosophical musings about arithmetic. Portraying me as juvenile and dangerous.
Are you interested in learning statistics or data analysis?
I think learning how to analyze data is tricky because it's actually 3 independent skills.
- Coding
- Applied Knowledge
- Probability Theory π§΅π
When I first started learning data analysis, it was frustrating for me to realize that being good at one of these skills didn't mean I was good at all of the others. So, If you've ever felt that way, you're not alone. 2/8
Coding: Being good at coding allows you to implement your ideas. While it's possible to get by using software, it will limit you as a data analyst. 3/8