One of our key pieces of advice is to be careful of confirmation bias.
There's a thread going around about how the crop below is what happens when Twitter's use of eye-tracking technology to crop images is fed with data from a misogynistic society. I almost retweeted it. But…
…that story fits my pre-existing commitments about how machine learning picks up on the worst of societal biases. So I thought it was worth checking out.
The fault here is not with twitter at all — it's a careless bit of code on @techreview's part, that just crops the middle blindly out of any image assigned to a twitter card.
This is not to say that machine learning doesn't introduce all kinds of biases, nor to say that twitter's doesn't frustrate us by not allowing users to set their own crop.
It is good lesson, though, in digging deeper before jumping to conclusions that match one's priors.
On the other hand, if you haven't seen this, this seems to be a legit case of algorithmic bias on the part of Twitter's cropping algorithm.
"Sophisticated algorithms based on machine learning may discover very delicate and elusive nuances in facial characteristics and structures that correlate to innate personal traits and yet hide below the cognitive threshold of most untrained nonexperts."
Campbell's Law states that "The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Even the most elite institutions succumb.
Here's a beautiful example of how one tracks down and debunks quantitative malfeasance.
"Columbia is a great university and, based on its legitimate merits, should attract students comparable to the best anywhere. By obsessively pursuing a ranking, however, it demeans itself. The sooner it changes course, the better."
Some of you may have seen this before, but if you haven't: This slide from Moderna compares their flu mRNA vaccine to the one from Sanofi.
Can you spot the misleading dataviz trick?
This was a fascinating exercise. People pointed out a lot of important issues.
The different age groups, while not a dataviz trick per se, do smack of the sort of apples-to-oranges comparison we worry about.
Using a log scale for bar charts is questionable territory, though a log scale is quite appropriate here for this kind of data and I've been guilty of the same. We wrote a bit about this special case: callingbullshit.org/tools/logarith…
In our course, we spend a lot of time talking about selection bias and related phenomena. These issues can be extremely subtle. Example:
The question is whether you are better protected against COVID if you've first vaccinated then reinfected, or first infected then vaccinated.
To answer that, you might look at data such as those in a recent medRxiv paper by Goldberg et al.
Comparing infection rates, it *appears* you are better off 6-8 months after being infected then vaccinated ( RtV) than you are 6 months after being vaccinated then infected (VtR).
(Here I'm setting aside issues of significance, multiple comparisons, etc. — this is intended as a teaching example.)
But there's a problem with that inference, grounded in the fact that we are looking at observational data: the groups caught COVID under different circumstances.
The first thing to notice is that this graph shows annual *change* in murder rate. Showing changes is fine, when there's a good reason to—and there may be one here.
But notice the consequence. The much larger decrease, spread over many years from the late 90s, is backgrounded.
Here is are the absolute numbers over the same time period. To the credit of the @nytimes, this graph is shown in the article as well.
But of course that's not the one that takes off on twitter, facebook, etc.
The authors explore how three different groups—Stanford students, professional academic historians, and fact checkers—evaluate the reliability of online information.