Let's define some terms. Race is a social categorization of people into groups, typically based on physical attributes. Genetic ancestry is a quantification of genetic similarity to a reference population. While correlated, they have fundamentally different causes & consequences.
We should care about causes, and race is a poor causal model of human evolution. In truth, genetic variation follows a "nested subsets" model, where all people eventually share ancestors, which is fundamentally different from race (see for yourself here: ). james-kitchens.com/blog/visualizi…
Formally, race-like models do not fit well to the genetic distances we observe even in highly geographically distinct populations with minimal admixture [Long et al 2009] (and as we'll see, mixture is very common). Trying to make a racial model work produces nonsense.
Consistent with nested subsets, a pair of individuals from Africa have more genetic differences than a pair from Africa/France, and the majority of those differences are *common* across all populations [Biddanda 2020]. Population-private common variants are very infrequent.
Notably, before genetic data, race advocates predicted that most variation would be homozygous within racial groups and highly divergent. The truth is the opposite even for ancestry: if you condition out population labels you are still left with 85% of the genetic variation.
But what about ancestry? Let's go through three core methods for analyzing ancestry in genetic data: dimensionality reduction (PCA), model-based clustering (STRUCTURE), and parametric model (admixture graphs). Because ancestry is relative, each approach has limitations.
1: PCA estimates eigenvectors of the sample relatedness matrix which, under simple ancestry models, are expected to recover population labels. Matrix theory shows that PCA is extremely sensitive with enough data, able to detect relationships down to a handful of generations.
But PCA is easily distorted by the sampling process: bigger populations will warp the PCA locations, even individuals within a single family can look like different populations. PCA also produces unusual artifacts when there is simple spatially locality in the data.
2: STRUCTURE clusters individuals as mixtures of a fixed set of (k) populations. It shares strengths with PCA (sensitive, interpretable) but also limitations (distorted by sampling and parameter choice).
[Lawson 2018] provide many examples where STRUCTURE misses known, uh, structure; merges divergent populations together; finds false admixture; or models ancient genomes as mixtures of modern ones.
3: Admixture Graphs use allele-sharing statistics to fit population drift, splits, and mixtures. A very powerful approach but identifiability is a challenge: is the graph you found is significantly better than all other possible graphs? [Maier 2023] show it often is not.
Got all that? Now let's look at some real data from biobanks. Again we see that race is a very poor model of human populations, which are continuous mixtures of multiple data sources with no clear boundaries or mapping to any folk racial constructs.
As predicted, PCA is sensitive. When we zoom in on populations we see clines all the way down: county-level correlations among self-reported British whites; down to neighborhood level correlations in Chinese and Japanese biobanks. Continuous relationships at all scales.
I sometimes see the argument that, even if race is flawed, genetic ancestry can tell us the "true" races. But this is clearly wrong. These methods depend on a sampling process we cannot know, and real data is full of the mixtures and continuous relationships we just saw.
This is even more apparent when looking at populations in history. Dynamic admixture, migration, and continuous structure is historically common and sometimes quite rapid. Geography-ancestry relationships have been rewritten countless times.
Yet again, historic models motivated in part by racial thinking presumed that populations largely evolved through "serial founder" events and developed in isolation. Genetic data shows us this is clearly not the case and our history is much more complicated.
Finally, in Africa, we see the limits of even our sophisticated computational models. Highly complex structure and gene flow can fit models of deep separation or continuous migration equally well, even when including ancient DNA. Our genetic history in Africa remains a mystery!
In sum, we use models to understand the causal processes in our world and race is a very poor causal model for genetics. But even models of genetic ancestry have fundamental limitations in light of our complex and dynamic, nested human history. Much more work to be done!
/fin
@threadreaderapp unroll
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Oof. Polygenic scores for IQ lose 75% of their explained variance when adding family controls, even worse than the attenuation for Educational Attainment. These are the scores Silicon Valley is using to select embryos 😬.
The TEDS cohort used here is a very large study with high-quality cognitive assessments collected over multiple time points. It is probably the most impressive twin study of IQ to date. That means very little room for data quality / measurement error issues.
It is important to highlight surprising null results. Just last week we were hypothesizing that large IQ score attenuation could be a study bias or an artifact of the Wilson Effect. Now we see it replicate in an independent study with adults.
Racism twitter has taken to arguing that observed racial differences must be "in part" explained by genetic differences, though they demure on how much. Not only is this claim aggressively misleading, it is completely unsupported by data. A 🧵:
Genetic differences between any two populations can go in *either* direction, matching the phenotypic differences we observe or going against them. Genes also interact with the environment, which makes the whole notion of "explaining" differences intractable.
The mere fact that a trait is heritable within populations tells us nothing about the explanatory factors between populations. See: Lewontin's thought experiment; Freddie de Boer's analogy to a "jumping contest"; or actual derivations (). pubmed.ncbi.nlm.nih.gov/38470926/
James Lee and @DamienMorris have an interesting perspective paper out describing "some far-reaching conclusions" about the genetics of intelligence. This type of "where are we now" paper is very fun and more people should write them! So, where are we now? 🧵
It's a short paper and it surveys three core findings from the past decade of intelligence genetics. These sections follow a structure that I would cheekily call ... "make a bold claim in the title, then walk it back in the text".
First up, they address the concern that associations with intelligence may actually be mediated by functionally irrelevant traits like physical appearance or pigment. The argument is that IQ GWAS has demonstrated enrichments for CNS/brain structure gene sets. This is true!
The SAT/meritocracy debate has always been a bit odd to me when the test makers themselves have studies showing self-reported high-school GPA is a consistently better predictor of college GPA and always adds on top of SATs.
Clearly SATs are neither the only nor even the best measure we have of college success and "holistic" admissions can be "meritocratic". It's up for debate whether the additional <10% predictive variance SATs give you are worth the high-school testing industrial complex.
A challenge with all of these analyses is they are measured after selection on the predictor variables themselves, which can induce biased estimates through range restriction. The raw correlations are even lower, and it is hard to know whether correcting is appropriate.
Hanania advocated passionately against "race mixing" for years, so he knows what he's talking about here. But it's worth adding that race-IQ obsessives also tend to make very poor predictions about the future. Let's review ...
The Bell Curve, published at the peak of the 80-90's crime wave, predicted a coming dystopian urban hellscape with a "cognitive underclass" living in state-managed facilities. Not only did all this fail to materialize, but crime rates collapsed.
Charles Murray has nevertheless spent the following 30 years predicting vindication for his claims was just around the corner ... each time pointing to a new corner.
Nice! Here we have an interesting paper using genetic ancestry to classify race/ethnicity in modern data and algorithms. Let's take a look at what this paper found: 🧵
First, I don't want to get too hung up on language, but TCB's tweet starts talking about "ethnicity", then shifts to "continental ancestries", and then entirely omits the largest ethnic group in the US: Hispanics. These terms have distinct definitions (). nap.nationalacademies.org/catalog/26902/…
Anyway, how well can this paper actually impute ethnicity from genetic ancestry in a large cancer population ()? ~17% of the time it gets Hispanic classification completely wrong or a no-call! worldscientific.com/doi/10.1142/97…