FOUR things to know about race and gender bias in algorithms:
1. The bias starts in the data
2. The algorithms don't create the bias but they do transmit it
3. There are a huge number of other biases. Race and gender bias are just the most obvious
4. It's fixable! π§΅π
By race and gender bias in algorithms, I mean the tendency for heavily data-driven AI algorithms to do things like reproduce negative stereotypes about women and people of color and to center white male subjects as normal or baseline. 2/9
While race and gender bias in algorithms *is fixable*, the current fixes aren't easy. They require us to understand and then mathematically model the processes that generate the biases in the data in the first place. 3/9
Traditionally, statistics culture says any bias in the model is the responsibility of the modeler. Modelers are expected to address potential bias to the best of their ability. This isn't race or gender bias specific. Statisticians care a lot about bias of all kinds. 4/9
There doesn't seem to be a consensus in the computer science and machine learning communities regarding who is responsible for misbehaving algorithms, although many AI ethics researchers are pushing for these communities to take more responsibility. 5/9
Many companies make massive profits from mathematical models which contain gender and race bias so one could argue that they have some ethical responsibility to make good faith attempts to address these biases. 6/9
There are many techniques in statistics for addressing bias in data but they require you to understand the data on a deeper level than is common in machine learning applications so they are hard for machine learning researchers to adopt. 7/9
Machine learning focuses a lot on the parts of statistics that can be automated and it's hard to automate a deep sociological understanding of race and gender and the cultural processes that cause bias in the production data. 8/9
While fixing bias in machine learning algorithms is a solvable problem, the cost and complexity of fixing it will probably mean that such biases will remain an issue for some time. 9/9
I hope you found this thread informative. If you would like to support this kind of content, follow me and also like and retweet the thread. π
Some caveats on "algorithms don't create the bias":
1. I'm assuming people aren't biased in how they made the algorithm. I think this type of bias falls into the typical structural biases we see in society and isn't tech specific so I didn't address it in this thread.
2. Algorithms can technically create bias by just being crappy algorithms or due to mistaken assumptions about the data. I didn't address this situation in my thread either since making mistakes in data analysis isn't tech specific either.
One thing I want to make clear is I don't think fixing the biases in the data will address all the issues with gender and race discrimination in AI algorithms. Some people naively think this and it's almost certainly wrong.
Oof. This thread is doing numbers. π There are a few comments disagreeing with this thread. One thing to keep in mind as you read them. As far as I can tell, they are misinterpreting what I said because theyβre using a different definition of bias.
They see an algorithm as βbiasedβ if it has tendency to create unequal outcomes. Iβm using bias in the statistical sense. Basically, Iβm talking about how to create a colorblind algorithm that doesnβt use race in its decision making.
I think their concerns about fairness are valid but itβs not at all what I was talking about.
β’ β’ β’
Missing some Tweet in this thread? You can try to
force a refresh
Many of the biggest tech trends in data analysis can be seen as increasingly sophisticated answers to the question, "How do we monetize data?" π§΅π
The first answer to this question was the buzzword BIG DATA. People thought all you needed was a lot of data, didn't matter what kind, and it would basically monetize itself. Unfortunately, this was incorrect. So the next question became "How do we monetize lots of data?" 2/9
The answer to this question turned out to be the next buzzword. DATA SCIENCE. At this point, people still thought data was inherently easy to monetize so they figured anybody could do it. This turned out to be wrong as well. So the new question became... 3/9
Someone on Twitter just shared this very interesting essay. "Does A=A? I'm not so sure" by James Lindsay
It's a postmodernish musing on the truth of arithmetic statements! π I read it so you don't have to.
It disappeared while I was reading so this tweet is now the only copy!
I know it sounds like I'm making this up but this essay is gone like it never existed! The only reference I could find to the page on the internet is this comment on goodreads. goodreads.com/author_blog_poβ¦
It may (or may not) surprise you to know that this man, James Lindsay, has mocked me mercilessly with all kinds of mean-spirited memes and sneering tweets for my philosophical musings about arithmetic. Portraying me as juvenile and dangerous.
Are you interested in learning statistics or data analysis?
I think learning how to analyze data is tricky because it's actually 3 independent skills.
- Coding
- Applied Knowledge
- Probability Theory π§΅π
When I first started learning data analysis, it was frustrating for me to realize that being good at one of these skills didn't mean I was good at all of the others. So, If you've ever felt that way, you're not alone. 2/8
Coding: Being good at coding allows you to implement your ideas. While it's possible to get by using software, it will limit you as a data analyst. 3/8
I gave up on talking about race on Twitter because I was having the same argument over and over again. In this thread, let me explain THE ANATOMY OF A TWITTER RACE ARGUMENT.
Whenever someone says "X is white supremacy" on Twitter where X is perfectionism or individualism or math worship, there are a constellation of reactions. Many of them predictable.
If X is a genuine point of division, it will often be the case that most white Americans tend to do and like X while most black Americans tend to dislike and not do X. This cultural difference may or may not be problematic.
I've been in science for a while now and as far as I can tell, there are two types of people in this line of work. Those that think we should give everything to science and those that don't.
These two mindsets produce two types of work environments. I'll call them the results-first workplace and the people-first workplace.
In the people-first environment, they prioritize healthy work habits and relationships. Science is a critical piece of a whole and healthy life. In the results-first environment, all that matters is the outcome. People get the job done whatever the cost.