Some thoughts on the ability to distinguish populations with genetic variation, why that means little for trait differences, and why there are other good reasons to collect diverse data. 🧵
I was pleasantly surprised to see no one mount a strong defense of "biological race" in this thread. Even the people throwing this term around seem to realize it's not supported by data. Instead the conversation shifts to population "distinguishability".
For example, a random twitterer (left) and a professor (right) emphasizing that genetic variation can be used to "distinguish" populations. And it's true, one can aggregate small per-variant differences into genetic ancestry estimates that often correlate highly with geography.
Moreover, it's true at essentially all scales: ancestry inference in self-reported whites in the US correlates w/ European country references; ancestry inference in self-reported "White British" in the UK correlates with latitude/longitude and counties in the UK; and so on.
In fact we know, given enough sites, a method like PCA can identify correlations down to a handful of generations (or even pick up families). Of course, no one argues counties/zipcodes are biological units, so "distinguishability" alone is not meaningful. What is meaningful?
We might be interested in meaningful distinguishability of genetically driven traits. But unlike genetic ancestry, a neutral trait does NOT become more differentiated as you aggregate more variants. So distinguishable ancestry NEED NOT translate into trait differences.
We even have bounds: the expected between-population neutral trait variance is Fst * heritability. For human populations and traits this is very low (1-8%) even if we take genetic ancestry extremes, and of course these differences are centered at zero and go in either direction.
We might be interested in individual large-effect variants with big frequency differences due to bottlenecks (like BRCA) or selective sweeps (like pigment or lactase). In the early genome days there was great speculation that "divergent genes" would explain trait disparities.
Such studies have been run and, as it turns out, "hard sweeps" are very infrequent. This is broadly appreciated in the field but draws intense backlash on twitter, so I'll just quote some sources [ , ] and save the details for later. web.stanford.edu/group/pritchar… nap.nationalacademies.org/catalog/26902/…
Lastly, perhaps the causal effects of common variants differ substantially between populations (for example due to interactions). Though more work is needed, studies using local ancestry show this does not generally appear to be the case. Details here:
In short, "distinguishable" ancestry in PCA tells us nothing about traits, either neutral trait means, hard locus-specific selection, or genome-wide effect sizes. So why do we collect diverse data? IMO three good reasons and none of them have to do with trait divergence:
1: Diverse populations are likely to have more diverse *environments*, which (we hope) is useful for understanding the relationships between genetic variation and context [ex: ], as well as enriching for more environmental risk factors.ncbi.nlm.nih.gov/pmc/articles/P…
2: Association studies estimate effects from "tag" SNPs + noise due to LD and frequency. The noise is further amplified across populations leading to poor prediction. Diverse data can improve prediction and increase sensitivity by cleaning up this noise.
3: Diverse data picks up a few more rare variants (especially non-singletons). These contribute very little to group-specific trait differences, but they can improve imputation, identify novel biology, &be important to their carriers (ex: drug reactions).
TLDR: "distinguishability" is mostly a matter of having enough data points in the analysis. We collect diverse populations not because we expect much trait divergence, but to capture environments, better tag SNPs/LD, and variants in a more useful frequency range. /fin
@threadreaderapp unroll
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Something I don't want to get lost is that the field is much better now at studying, visualizing, and discussing complex populations than it has ever been, and there are many resources to help do this effectively. A few suggestions below:
The NAES report and interactive on using population descriptors [] and Coop on genetic similarity [].
Let's define some terms. Race is a social categorization of people into groups, typically based on physical attributes. Genetic ancestry is a quantification of genetic similarity to a reference population. While correlated, they have fundamentally different causes & consequences.
We should care about causes, and race is a poor causal model of human evolution. In truth, genetic variation follows a "nested subsets" model, where all people eventually share ancestors, which is fundamentally different from race (see for yourself here: ). james-kitchens.com/blog/visualizi…
I’ve seen quotes from David Reich’s “Who We Are and How We Got Here” passed around with the insinuation that it is secretly supportive of racist and hereditarian theories, even though it directly criticizes such views. It's worth looking at what Reich actually wrote: 🧵
Reich writes at length about Nick Wade's book 'A Troublesome Inheritance', a distillation of the hereditarian position. He makes clear that Wade misleads "naive readers" into a position that has "no merit": that genetic differences correspond to traditional racial stereotypes.
Reich calls out an essay by Cochran, Hardy, and Harpending that claims Jewish intelligence is the product of natural selection, which is contradicted by evidence that disease-causing mutations in Ashkenazi Jews are simply a consequence of population bottlenecks and bad luck.
So this is pretty typical of the low-information content you get from the genetic racists. The majority of this post is just blather but there is one (1) specific claim about genetics: that the molecular genetic contribution to IQ keeps going up every year. This is false. A 🧵:
The first study in 2011 into the heritability of IQ using molecular genetic methods found moderately high estimates 40-51%. But this approach was flawed technically (estimator bounds and population structure) and conceptually (environmental confounding).
Fast forward to 2023, using hundreds of thousands of people from the UK Biobank, Williams et al. [] ran a battery of analyses to refine a high-quality IQ estimate. The heritability ... 0.20 (with very precise error). pubmed.ncbi.nlm.nih.gov/36378351/
The racists in Stancil's replies have started appealing to "scientific consensus". So let's look at what the consensus of *high-quality evidence* is on genetic racism. A 🧵:
On genetics/race/behavior, over a hundred population geneticists denounced Nick Wade's A Troublesome Inheritance (a sort of genetic racism catechism). Their conclusion: "there is no support from the field of population genetics for Wade’s conjectures"
David Reich, a preeminent population geneticist, went on to write an entire book on the topic of genetic ancestry. His conclusion: "the ancient DNA revolution ... is fueling a critique of race ... Mixture is fundamental to who we are"
Let me expand on this since I think it's a useful lens through which think about heritability estimates. When we talk about "dominance" we're really talking about genetic effects that deviate from additivity: an effect only kicks in when you have both/neither allele. A 🧵:
Most common traits in humans are driven by tens/hundreds of thousands of genetic variants of small effect, so we are interested in dominance heritability i.e. the contribution of *all* of these non-additive effects together, which we can contrast with the additive contribution.
There's a long-standing debate over the extent and causes of dominance effects in human traits, summarized well in a recent study of Palmer et al []. Certainly we see plenty of non-additivity at the biological level, but what about genetic effects? science.org/doi/10.1126/sc…