Sasha Gusev Profile picture
Feb 21 20 tweets 9 min read Read on X
I've written about race, genetic ancestry, analyses of large biobanks, and human history



I'll summarize the key points here 🧵: gusevlab.org/projects/hsq/#…
Image
Let's define some terms. Race is a social categorization of people into groups, typically based on physical attributes. Genetic ancestry is a quantification of genetic similarity to a reference population. While correlated, they have fundamentally different causes & consequences. Image
We should care about causes, and race is a poor causal model of human evolution. In truth, genetic variation follows a "nested subsets" model, where all people eventually share ancestors, which is fundamentally different from race (see for yourself here: ). james-kitchens.com/blog/visualizi…
Image
Formally, race-like models do not fit well to the genetic distances we observe even in highly geographically distinct populations with minimal admixture [Long et al 2009] (and as we'll see, mixture is very common). Trying to make a racial model work produces nonsense.
Image
Image
Consistent with nested subsets, a pair of individuals from Africa have more genetic differences than a pair from Africa/France, and the majority of those differences are *common* across all populations [Biddanda 2020]. Population-private common variants are very infrequent. Image
Notably, before genetic data, race advocates predicted that most variation would be homozygous within racial groups and highly divergent. The truth is the opposite even for ancestry: if you condition out population labels you are still left with 85% of the genetic variation. Image
But what about ancestry? Let's go through three core methods for analyzing ancestry in genetic data: dimensionality reduction (PCA), model-based clustering (STRUCTURE), and parametric model (admixture graphs). Because ancestry is relative, each approach has limitations.
1: PCA estimates eigenvectors of the sample relatedness matrix which, under simple ancestry models, are expected to recover population labels. Matrix theory shows that PCA is extremely sensitive with enough data, able to detect relationships down to a handful of generations.
Image
Image
But PCA is easily distorted by the sampling process: bigger populations will warp the PCA locations, even individuals within a single family can look like different populations. PCA also produces unusual artifacts when there is simple spatially locality in the data.

Image
Image
Image
2: STRUCTURE clusters individuals as mixtures of a fixed set of (k) populations. It shares strengths with PCA (sensitive, interpretable) but also limitations (distorted by sampling and parameter choice). Image
[Lawson 2018] provide many examples where STRUCTURE misses known, uh, structure; merges divergent populations together; finds false admixture; or models ancient genomes as mixtures of modern ones.

Image
Image
Image
3: Admixture Graphs use allele-sharing statistics to fit population drift, splits, and mixtures. A very powerful approach but identifiability is a challenge: is the graph you found is significantly better than all other possible graphs? [Maier 2023] show it often is not. Image
Got all that? Now let's look at some real data from biobanks. Again we see that race is a very poor model of human populations, which are continuous mixtures of multiple data sources with no clear boundaries or mapping to any folk racial constructs. Image
As predicted, PCA is sensitive. When we zoom in on populations we see clines all the way down: county-level correlations among self-reported British whites; down to neighborhood level correlations in Chinese and Japanese biobanks. Continuous relationships at all scales.
Image
Image
I sometimes see the argument that, even if race is flawed, genetic ancestry can tell us the "true" races. But this is clearly wrong. These methods depend on a sampling process we cannot know, and real data is full of the mixtures and continuous relationships we just saw. Image
This is even more apparent when looking at populations in history. Dynamic admixture, migration, and continuous structure is historically common and sometimes quite rapid. Geography-ancestry relationships have been rewritten countless times.

Image
Image
Image
Yet again, historic models motivated in part by racial thinking presumed that populations largely evolved through "serial founder" events and developed in isolation. Genetic data shows us this is clearly not the case and our history is much more complicated. Image
Finally, in Africa, we see the limits of even our sophisticated computational models. Highly complex structure and gene flow can fit models of deep separation or continuous migration equally well, even when including ancient DNA. Our genetic history in Africa remains a mystery!
Image
Image
In sum, we use models to understand the causal processes in our world and race is a very poor causal model for genetics. But even models of genetic ancestry have fundamental limitations in light of our complex and dynamic, nested human history. Much more work to be done!

/fin
@threadreaderapp unroll

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Sasha Gusev

Sasha Gusev Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SashaGusevPosts

May 12
I've written the first part of a chapter on the heritability of IQ scores. Focusing on what IQ is attempting to measure. I highlight multiple paradoxical findings demonstrating IQ is not just "one innate thing".



I'll summarize the key points here. 🧵 gusevlab.org/projects/hsq/#…
Image
First, a few reasons to write this. 1) The online IQ discourse is completely deranged. 2) IQists regularly invoke molecular heritability as evidence for classic behavioral genetics findings while ignoring the glaring differences (ex: from books by Ritchie and Haier/Colom/Hunt).
Image
Image
Thus, molecular geneticists have been unwittingly drafted into reifying IQ even though we know that every trait is heritable and behavior is highly environmentally confounded. 3) IQ GWAS have focused on crude factor models that perpetuate the "one intelligence" misconception.
Read 21 tweets
Apr 30
It pains me to see facile critiques of GWAS on here from our clinical/biostats friends while the many actually good reasons to be critical of GWAS get little attention. So here's a thread on what GWAS does, what critics get wrong, and where GWAS is genuinely still lacking. 🧵:
Here’s an example of what I’m talking about from Frank Harrell’s otherwise excellent critique of bad biomarker analysis []. This gets GWAS completely wrong. Genome-wide significance is not about "picking winners" or "ranking" the losers. fharrell.com/post/badb/
Image
Genome-wide significance is about identifying variants for which the estimated effect size is *accurate*. And since most traits are polygenic (meaning a large fraction of variants will have some non-zero association) this practically means getting effect *direction* right.
Read 15 tweets
Apr 20
I’ve seen critiques of the poor methodology and cherry-picking in The Bell Curve but I haven’t seen much about the absolutely deranged fever dream of predictions about the coming decades in its closing chapters. It has been 30 years, so let's review. 🧵: Image
Low skill labor will become worthless, attempts to increase the minimum wage will backfire. In the not-too-distant future, people with low IQ will be a ”net drag” on society. Image
“Cognitive resources” in the inner city have already fallen “below the minimum level” and will escalate into a “fundamental breakdown in social organization”. “The Underclass” will become isolated and increasingly unable to function in the larger society.
Image
Image
Read 9 tweets
Mar 29
Unpopular opinion (just look at the QT's) but nearly every "dogmatic, outdated, and misleading" claim about IQ listed here is either objectively accurate or heavily debated dispute within the field itself.

Let's take them one at a time:
"IQ tests were necessarily biased"

One way test bias is evaluated within the field is by testing for strong measurement invariance (i.e. that subtest behavior is consistent across groups). This method is almost never applied in the classic literature or applied poorly (MCV).
When MI is tested for, it fails often enough that test bias should be the first concern when doing any group comparisons [see Dolan et al. for some examples: ]. Test makers work hard to mitigate bias but intelligence researchers often do not.…ltewichertsdotnet.files.wordpress.com/2015/12/dolans…
Read 13 tweets
Mar 1
Some thoughts on the ability to distinguish populations with genetic variation, why that means little for trait differences, and why there are other good reasons to collect diverse data. 🧵
I was pleasantly surprised to see no one mount a strong defense of "biological race" in this thread. Even the people throwing this term around seem to realize it's not supported by data. Instead the conversation shifts to population "distinguishability".

For example, a random twitterer (left) and a professor (right) emphasizing that genetic variation can be used to "distinguish" populations. And it's true, one can aggregate small per-variant differences into genetic ancestry estimates that often correlate highly with geography.
Image
Image
Read 16 tweets
Feb 27
Something I don't want to get lost is that the field is much better now at studying, visualizing, and discussing complex populations than it has ever been, and there are many resources to help do this effectively. A few suggestions below:
The NAES report and interactive on using population descriptors [] and Coop on genetic similarity [].

Carlson et al. [] and Lewis et al. [] on accurate presentation of ancestry.nap.nationalacademies.org/resource/26902…
arxiv.org/abs/2207.11595
nature.com/articles/d4158…
pubmed.ncbi.nlm.nih.gov/35420968/
Borrell et al. [] on race/ancestry in medicine.

Lawson et al. on understanding STRUCTURE []. McVean on understanding PCA [].nejm.org/doi/full/10.10…
nature.com/articles/s4146…
journals.plos.org/plosgenetics/a…
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(