White by Default?
The viral posts were right.
We scraped 5.5 million criminal records and 1.5 million mugshots from 39 states.
29% of Hispanics are being misclassified as White in official Department of Corrections databases.
Even when Hispanic is explicitly classified 🧵
Everyone's seen these collages claiming non-whites get classified as White in criminal databases.
The problem? Anecdotal. Cherry-picked. No way to verify.
We had 1.5 million mugshots, names, and official racial classifications.
Time to test it systematically:
We trained a multinomial logistic regression on 18 features:
• DeepFace racial probabilities from mugshots
• Census name demographics
• First and last name racial statistics
92.76% accuracy distinguishing Black, White and Hispanic.
The key insight: A sufficiently accurate linear model trained on biased data learns the TRUE signal, not the bias.
Systematic deviations between predictions and official labels indicate mislabeling by authorities, not model error.
Here's what we found....
29% of predicted Hispanics were officially classified as White.
Even at 95-100% model confidence, 22.4% of predicted Hispanics were still assigned White.
Median confidence for these cases? 91.7%.
Visual inspection confirmed it.
These are people classified as "White" in official records. Look at those names!
Furthermore, PC mapping revealed that many "Whites" were in Hispanic variable zones, but not the other way around. Measuring the euclidean distance from the centroids, Whites were just as distinguishable from Hispanics as Blacks were from Whites.
To further confirm the validity of our method through visual inspection, we contrasted low and high confidence classifications. High confidence misclassifications almost always were the predicted race instead of the assigned race.
We corrected for misclassification:
Hispanic criminal record rates increase 20-31%
White rates decrease 4-6%
Black rates decrease 1%
The lowerbound being only high confidence reassignments (>90% confidence), the upperbound assuming all predicted = actual race.
State-level analysis showed massive variation.
Florida: 60%+ of Hispanics misclassified as White (Cubans tend to self-id as White?)
But no correlation with political ideology (r = 0.21, p = 0.472).
For the analysis, FULL REPLICATION, code, data, github check out my blog post on it:
White by Default: Systematic Bias in U.S. Criminal Racial Assignment
uncorrelated.xyz/p/white-by-def…
Share this Scrolly Tale with your friends.
A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.
