, 21 tweets, 4 min read Read on Twitter
Sadly read another paper using "self reported ethnicity" as the way to make a GWAS-ready subset of the study population (in this case, "European American self reported ethnicity"). This is WRONG and it has two levels of wrongness.
First this is a persistence of a common place view that self reported ethnicity (eg, "White British", "European American", "African-American") is a cheap, perhaps error prone, but information containing piece of the genetics of someone. Nope.
Self reported ethnicity is a messy, a little bit genetics, big dollop of culture and actively unuseful for the *biology* of humans as a label. The genetics it hooks into (skin colour, facial features) are a minority of the genome, and themselves very complex
Although the self reported ethnicity is not "purely orthogonal" to a more genomewide ancestry assessment, the relationship is too messed up to be useful biologically. Even its framing and reporting is messed up
(In other words, you will also be pulling in all sorts of other weird biases - think how "Mixed Race" classification vs "Black British" or "African American" is really cultural and attitude nothing to do with genetics; take Black British in Liverpool for example)
As @AdamRutherford eloquently puts it this lack of anchoring in biology doesn't make race/ethnicity (or racism) any less real; race (and racism) is real because our cultures make it so (sadly). For another thread and post, best by Adam :)
The second level of wrongness is that it is ... well ... wrong for GWAS. The goal is to find a subset of people whose relationships are close enough to equal for you to (a) not worry about background genetic effects (b) not drag in confounders that are non random wrt genetics
(It's (b) you need to be paranoid about. you can't do this perfectly, so to model out the the inevitable messiness of observational studies one throws in 100 PCs from the genetics to try to overcompensate for anything left correlating to ancestry/relationships)
Self reported ethnicity just ... is nowhere near as good as using the underlying genetics (even in this somewhat arbitrary "plot the first two PCs and draw a a circle around the central blob"). You can *see* this from the PC plots
If you look at the PC plots of your study population coloured by ethnicity you'll find there is 10 - 15% of "self reported white European" (making the assumption your study population is mainly from Europe) not in the central blob, or more if you take older "blob" definitions
(If you do a study in Japan, you will find your central blob is ... wait for it ... mainly self reported Japanese but there will be a whole bunch of self reported Japanese that "don't fit" just in same the way as above. Guess what, human populations are messy)
This is *not* an error, and it is *not* mainly people who have recent admixture ("are mixed race in fact and don't know it" - a surprisingly common thought). It is the fact that European ancestries are complex.
Some of this complexity is quite well known - Finnish are their own genetic world (for all sorts of reasons);
Sardinians are overrepresented in Anatolian ur-Farmer genetics vs your "run-of-the-mill" Flemish/French/South British/Rhinelander blend of Anatolian Farmer + Beaker People + some early Hunter Gatherer
But.... if I had mainly Sardinians in my study, this would all flip around, and all the other Europeans would look "odd". It's all about how you sample people.
(I get quickly lost in the population history of Europea and anywhere else though I find it fascinating - @aylwyn_scally is my go to source for the latest and greatest on this)
Arguably this is even worse in the US "European American" population which has both Native American and African haplotypes coming in at some rate *and* complex hetreogeneity of European founding also non random in geography (Finns in Seattle; Germans in Philidelphia etc)
Basically if you use self reported ethnicity to subset or (worse still!) as a covariate, you have no idea what other confounders you are bringing into your analysis, and you are going to ask your PC genetic modelling to do even more! Why do it? Don't do it!
Science lives and operates in society, and we have a complex, historically messy, culturally screwed up thing called "race/ethnicity". These ideas are everywhere but they are not just "not useful" in genetics, they have strange, you-dont-want-to-use them biases
These biases and complications - ranging from full blown discrimination and the awful history of racism, and other complex things - what does "Irish Traveller" as an "ethnicity" really mean - are worth studying but rarely useful to understand the biology of humans
If you want to do genetics, do genetics; leave ethnicity/race at the door. If you want to study sociology and society - race/ethnicity is a complex part of your study (and v.complex), predominantly cultural, and don't mistake it for genetics.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Ewan Birney
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!