Jon (@jonst0kes) wrote a thoughtful article about this weekend's events. I don't think he's a fan of "woke" politics but he's pretty good about not making his views the main focus of the piece. "On Saturday, March 27, Kareem Carr stepped on a...landmine" doxa.substack.com/p/understandin…
I don't know what I think of John's sociological analysis but I also don't have a better explanation for why people who I've been friendly with and supportive of for most of my time on Twitter suddenly turned on me. I don't think it's because I was "wrong" because I wasn't.
John argues that I was attacked because I'm proposing a solutions-oriented approach. I can definitely find tweets where my critics were saying one of the "dangerous" myths I was promoting was that there were fixes for bias in algorithms.
As a statistician, I was saying we could fix *statistical bias* in algorithms but I think many were worried anti-woke people would characterize me as saying there were technical fixes for sociological biases and perhaps because I'm black, it seemed urgent to counter me.
Many insisted on "whitesplaining" both math and racism to me a black person with an MSc in pure mathematics from one of the best math schools in the US. (This happens to me a lot on here by the way).
I was told "just stick to statistics" which had the same vibe to me as when folks tell black athletes to "shut up and dribble". To that person's credit, I told them that tweet crossed the line and they agreed and deleted it.
Many people said to me they were afraid @ylecun and others would misuse my tweets. Well, Yann has been supportive during this ordeal and I probably wouldn't have had any contact with him otherwise. I don't know if that's the outcome that those people wanted but it is what it is.
I was a little shocked. This is just not how academic discussions work in statistics. We try to be clear about our terms but we aren't trying to define things in ways that counter perceived political adversaries*. Maybe this is normal in other sciences(???) but I was blindsided.
*Okay well sometimes folks try to define probability in ways that make Bayesianism or Frequentism sound like nonsense but that's it! 😅 Nobody would ever tell me to not try to solve a problem using Bayesian methods because the Bayesians might misuse it as support for Bayesianism.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
As a black man, I'm concerned about the tendency for algorithms to exhibit what looks like racial bias. As a statistician, I'm naturally drawn to investigate why this happens But what is "bias"? Surprisingly, the answer depends on what you think it means to be "rational". 1/7
We can think of bias as a type of irrational behavior. So broadly speaking, there are two ways one could define bias in algorithms and this arises from the two major definitions of rationality. These are epistemic rationality and instrumental rationality. 2/7
Epistemic rationality is defined as the part of rationality which involves achieving accurate beliefs about the world. Instrumental rationality is the art of choosing and implementing actions that steer the future toward outcomes that you want. 3/7
Want to know what kinds of bias are fixable with statistics and how?
Read on... 🧵👇
This is a simple mental map of how different biases affect the process of using algorithms to make changes to the physical world. The way we can fix each bias is as follows...
- Data selection bias: you need an accurate mathematical model of the data creation process
- Statistical bias: you need good statistics
- Bias due to generalization: you need an accurate mathematical model of the observations in the data and in the target population 2/7
To fix the "bias due to causal assumptions", we need to fix all 3 smaller biases. At that point, if your model fits the data well then it should be a very close match to the world. In this case, correlation IS causation and we can say the inputs CAUSE the outputs. 3/7
FOUR things to know about race and gender bias in algorithms:
1. The bias starts in the data
2. The algorithms don't create the bias but they do transmit it
3. There are a huge number of other biases. Race and gender bias are just the most obvious
4. It's fixable! 🧵👇
By race and gender bias in algorithms, I mean the tendency for heavily data-driven AI algorithms to do things like reproduce negative stereotypes about women and people of color and to center white male subjects as normal or baseline. 2/9
While race and gender bias in algorithms *is fixable*, the current fixes aren't easy. They require us to understand and then mathematically model the processes that generate the biases in the data in the first place. 3/9
Many of the biggest tech trends in data analysis can be seen as increasingly sophisticated answers to the question, "How do we monetize data?" 🧵👇
The first answer to this question was the buzzword BIG DATA. People thought all you needed was a lot of data, didn't matter what kind, and it would basically monetize itself. Unfortunately, this was incorrect. So the next question became "How do we monetize lots of data?" 2/9
The answer to this question turned out to be the next buzzword. DATA SCIENCE. At this point, people still thought data was inherently easy to monetize so they figured anybody could do it. This turned out to be wrong as well. So the new question became... 3/9
Someone on Twitter just shared this very interesting essay. "Does A=A? I'm not so sure" by James Lindsay
It's a postmodernish musing on the truth of arithmetic statements! 😂 I read it so you don't have to.
It disappeared while I was reading so this tweet is now the only copy!
I know it sounds like I'm making this up but this essay is gone like it never existed! The only reference I could find to the page on the internet is this comment on goodreads. goodreads.com/author_blog_po…
It may (or may not) surprise you to know that this man, James Lindsay, has mocked me mercilessly with all kinds of mean-spirited memes and sneering tweets for my philosophical musings about arithmetic. Portraying me as juvenile and dangerous.
Are you interested in learning statistics or data analysis?
I think learning how to analyze data is tricky because it's actually 3 independent skills.
- Coding
- Applied Knowledge
- Probability Theory 🧵👇
When I first started learning data analysis, it was frustrating for me to realize that being good at one of these skills didn't mean I was good at all of the others. So, If you've ever felt that way, you're not alone. 2/8
Coding: Being good at coding allows you to implement your ideas. While it's possible to get by using software, it will limit you as a data analyst. 3/8