, 31 tweets, 6 min read
Inspired by conversations with @genemodeller + @EimearEKenny + @aylwyn_scally I'd like to make an argument that human genetics needs augment the "islands-of-ethnicity" model of human genetics + add to it an "everyone-is-admixed-some-recent-some-old". (my own coinage of terms)
The islands-of-ethnicity model is inspired by the classic population genetics models of islands with migration, and a simplifying assumption that individuals mate randomly within the island, and occasionally migrate between islands. Lots of good theory has been built up on this
(worth noting, some species really *do* fit this model reasonably well - the Glanville fritillary butterfly across the islands of Aaland between Sweden and Finland is a classic).
As well as being theoretically tractable with a variety of closed form population genetics theory (always nice) the "random mating" assumption is a useful, strong assumption to bring to genome-wide association studies + gets human common disease / trait geneticists out of a hole
Practical experience is that in many human samples you can find a group of people which seem "close enough" to this random mating assumption; by adding in some extra covariate controls on population stratification, one can hold this strong assumption of "random mating"
One can test how far off this assumption is for GWAS by looking at the QQ plot (or more sophisticatedly, LD score regression), effectively asking if the bulk of the genome fits a null model of random mating. It's a key diagnostic for a GWAS.
A slightly underappreciated aspect of this "restrict the samples to something relatively round in genetic space" is that it also restricts the effective environment of the humans -
Basically in societies with lots of socio-economic structure (anywhere big! - certainly US, Western Europe, China and Japan) different humans experience different environments due to this structure - its a long, messy story of human society here, including discrimination
The more structure in the society - 'environment' in genetics speak both the harder it is to pick up genetics (which is really a proportion of variance test) *and* less likely that this "PC trick" will work to remove residual population stratification.
But this "island of ethnicity" model is not true for humans. Trivially there are billions of people worldwide who don't fit this - most obviously "mixed race" people but also all sorts of other people: people from a Finnish+American marriage, or most Vortrekker families
Large groups of people who are consistently described with ethnicity terms also fit this - Afro-Caribbeans have both a large amount of recent European ancestries and Indian (Kalcutta/Bangledesh) ancestries as well as a mish-mash across Africa.
Indeed, the closer you look, the more complex human genetics is now (all Brazilians are complex; the India sub continent is really complex; Europe has complex refugues, clines and others; Africa itself is a real melting pot)
Nor is this a recent thing. Most modern Europeans are a complex mixture of three quite separated, identifiable groups (the latest mixing in the Bronze age); The complex "bantu" migration in Africa mixed up all sorts of people. The Spanish conquistadors leading to Hispanics
In fact in these populations one can sometimes ... sample people, find the modal centre of the genetics, do the PC trick to get random mating (not least in 'European ancestries') but these groupings change sample by sample. It's valid genetics, just not a good conceptual model
But there is no need for this model. Farm animal and plant genetics left behind random mating a long time ago in their analysis, embracing structured models (linear mixed models). This was largely due to bringing strong random assumptions from husbandry
We're unlikely to throw away the islands of ethnicity assumption in the *discovery* aspect of human genetics, but there is no need to hold on to it in the application of human genetics. Once we fine map to causal nucleotides (not a given!) then we can be more flexible
(Formally one needs to test that the effect size is not overly influenced by other genetic effects - epistasis - nor strong GxE effects, remembering the E is here "everything not genetic". But it looks like this holds a lot).
In between a full "everything is causally mapped, we do everything with full genome-sequence" nirvana and now I do think there is a half-way house model of "everyone is admixed" model;
In this model, we have both a good samples of human haplotypes worldwide, and a library of robustly discovered loci at "mapping" resolution (aka, the @GWASCatalog), and each human is considered a mixture of haplotypes (any mix is fine).
When we want to infer a genetic trait in an individual, we look at his or her particular mixture of haplotypes, and estimate from the library (perhaps ... carefully summing over uncertainties) for the trait.
This in no way gets us out of needing mapped loci in many human populations - for sure there are many causal DNA changes just not found in our currently mainly European sampled haplotypes.
But in this formulation the problem is the sampling of haplotypes, not the separation of people into groups and excluding people. Furthermore this model treats every human equally in the framework (though our data sampling biases us to estimate traits on some humans far better)
(interesting discussion here on "equity" of estimation with @genemodeller - one morally consistent position would be for us to deliberately "fuzz out" some predictions so it was equal for everyone. This seems ... not utilitarian and right, but interesting framing).
This also shows the importance of sub Saharan Africa sampling for *worldwide* improvements in genetics - sub Saharan Africa has by far the largest set of haplotypes in any location - that's where we should be *oversampling*.
But this also goes some other aspects - the "islands of ethnicity" model being wrong means that the ethnic labels which most people are super-confident about using are ... just labels not visible manifestations of a key human genetic structure
We have collectively decided that skin colour and hair types are somehow profoundly important tags of our genetics, and yet other visible traits, such as height, eye colour, finger shapes are just not important in society. When one steps back from this it is bizarre
We don't ask really tall people "oh, what's your heritage" or "do you have Baltic ancestry?", nor expect every red-haired individual to be able to recount the story of their family back to western Europe, nor try to catch a glance of finger sizes to "place" people.
And yet, it often seems a "given" that skin colour of someone on London street is meaningful - that not only does this describe individual characteristics (like height) but also comes with ... assumptions, placement in society.
In my view we've got to really work at separating out this "grouping" aspect of humans from genetics. As @aylwyn_scally continues to remind me, groups are not a good genetic structure - trees are the way to think
(Formally the slightly daunting "Ancestral Recombination Graph" - ARG to its friends - is the "true" data model behind all genetics. It is a complex as it sounds, but it is complete. Importantly groups are not a good approximation to the ARG)
<<sermon ends>>.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Ewan Birney

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!