Had another moment of "well, yes, but people *are* different" and "you geneticists use continental groups in your analysis" as we skirted around discussions of ethnicity / race in health impacts. TL;DR Partially correct but the underlying mindset that ethnicity=genetics is wrong
Let's deal with the correct things first. Yes, people are different partly (sometimes mainly) due to genetics. Visibly, eg height, weight, hair colour, skin colour, smoking habits + invisibly, eg cholesterol levels, heart trabeculation levels, likelihood of getting breast cancer
Some of these visible differences we integrate into the gestalt assessment of ourselves and others for ethnicity, as represented by self identified ethnicity boxes which people tick, eg "Black British, White English, British Indian, British xxx", gloriously variable by society
Also correct is that geneticists have (relatively) complex procedures in analysis which confusingly often use a short hand of the results with similar sounding language to ethnicity labels, eg, "European American" +non-random correlation to self identified ethnicity
The most "classic" use is the selection of subsets of cohorts of people where geneticists use wide genetic relatedness to select a group of people such that (a) that their environment, most importantly social (b) that genetic properties (eg, LD) are reasonably homogenous
We do this by using a longstanding statistical approach to analyse variation (Principal component analysis) to project the high dimensional genetics into a lower space +often use the first 2 dimensions (easy to visualise) to crudely selected people (draw a circle around the blob)
It is worth stressing there is nothing magical about PCA, and each PCA is unique to the *samples*. The "central blob" being European American in US cohorts is due to the sampling in the US; in Japanese cohorts the central blob is "Japanese" etc.
Although the selection of the biggest blob in the 2-D PCA nearly always associates with a self identified ethnicity (eg. "White" in UK BioBank) *plenty* of people who tick that box don't make it into the blob. This is not because they "got it wrong" or have some family secret ...
This is because we want to be conservative in the sub-selection of people to meet some statistical criteria - we're not trying to describe ancestry in the population genetics use of that term (PCA is not a good representation of this) nor predict ethnicity.
Focusing on the social environment component first, because I think it is most important, this is leveraging the fact that recent ancestry is a good marker for social environment - eg, crudely Black British people experience a different social environment to White British people
(it's weird, but we're using genetics to "predict" social environment features here)
What we're aiming to achieve is twofold - the most important is getting to pseudorandomisation of the environment with respect to genetics - if this holds, our downstream GWAS will hold. The second is limiting variation from environment (easier to spot the genetic variation).
It is also useful we're selecting for a relatively homogenous and random genetics - for example that linkage disequilibrium patterns of SNPs are well modelled as coming from one distribution.
Frustratingly we (as geneticists) don't study much this procedure about *why* it works and what precisely it is modelling (or rather, the properties of the people not selected vs selected) - partly because this walks you into some messy societal things, often not measured, also>>
because by definition the subselection is the majority of the people and the people left out of the blob are hetreogenous probably both in their social environment as well as their genetics - the former being the big headache.
It is clunky, but we suggest using more technical precise terms (eg, first usage, "The European-associated PCA
cluster, which aims to minimise variation in non-genetic factors and genetic factors" ; repeated usage "PCA-selected European subset"
We welcome feedback from genetics and non-genetics colleagues on how best to craft our language to explain what we're doing.
Back to my conversation - given geneticists do these sorts of subselections in their analysis, are we not saying there is *something* in the genetics of continental groupings / continental ancestry?
No, we're not. We're using (recent) continental ancestry to predict the *social environment* that people experience in our societies, and making using that the genetics we do our association studies with are randomised to these large societal differences.
It also gives us smoother LD properties and some other nice features, but for me this is secondary to the "elephant in the room" which is social stratification.
(I can sense some of my genetics colleagues now saying 'well what about genetic background, from "real" epistasis through to background linear polygenic scores? Surely *some* of this PCA selection/PCA covariates are modelling this? >>
This deserves its own thread, but the simple answer; given the elephant in the room of social stratification we have to be conservative, and accept we will "model out" some "real" genetic background.
In fact, I think in human genetics there is far less "genetic background" a la laboratory mouse/maize land races/farm animal genetics but this needs more words than a tweet!)
So.... Ethnicity is associated with a little bit genetics inherently (visible characterisitics) and depressingly alot of recent ancestry *via* the social stratification that associated with ethnicity, but this doesn't make ethnicity a crude proxy for genetics.
In both research (basic+clinical) and clinical practice we can't be "colour blind" - human society nearly everywhere is too stratified impacting all sorts of things via many routes - but *genetics* is always best done by measuring *DNA* not self identified ethnicity box ticking
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Ah. I love the smell of freshly baked data/analysis, well controlled false discovery rate (QQ plot) and just ... so many results. Which of the thousands of beautiful stars in the sky does one pull out to discuss? Biology is so endless and wonderful in its detail...
... to alter (butcher?) a passage from a far far wiser and more thoughtful man than me....
It is interesting to contemplate a tangled set of genetic results, associated to both well known genes and entirely anonymous regions of the genome, stories from physiology of old and hints of new insights, and to reflect ...
Great history of the electric car / mobility - 1890s onwards. I really like switching sometimes to a historical perspective on science and technology; it reminds one of the unchanging nature of human foibles and drivers with "you know how the technology story turns out"
There is, for me, a similar history of technology / medicine about the complex introduction of Xrays into medicine (I blogged about this 6 years - (! 6 years!) ago - ewanbirney.com/2015/10/genomi…
The journey of Xrays from spanking new whizzy technology to routine part of medicine is surprisingly complex - it involves twists and turns, inappropriate use of technology "just for fun" (echos of 23andme), and non obvious advocates for the uptake of the technology.
A COVID perspective: TL;DR - the pandemic in the developed world has shifted due to successful vaccines, though plenty of complex and tricky scenarios to navigate; the developed world is in the midst of even harsher transmission rate from Delta.
Context: I am an expert in human genetics and bioinformatics. I know experts in viral genomics, infectious epidemiology, public health, clinical trials and immunology. I have some COIs: I am longstanding consultant to Oxford Nanopore (sequencing company) and am on the Ox/AZ trial
With the perspective of a glorious holiday in Northumberland, the last week disconnected from work and twitter, I have some bigger picture musings on the pandemic from my perspective.
A personal view point on the #AlphaFold announcement today from the @DeepMind and @emblebi team, part of @embl. TL;DR - I am *still* pinching myself about this.
When @demishassabis and the AlphaFold team first presented the results from CASP to me last November I genuinely almost fell off my chair. I think I swore quite a bit (in a British way) in amazement.
One of the reasons was I knew how rigorous CASP was - 20 years ago people published all sorts of "solving the folding problem" which then... didn't work beyond the training set. CASP cleverly used the fact that there are genuinely unknown structures each year solved by experiment
*trumpets* A new preprint by colleagues in @PHE_uk from @isaperena's group and myself (my first infectious epidemiology paper!) on single source transmission of COVID19 using viral genotyping to understand relative risk of transmission settings. papers.ssrn.com/sol3/papers.cf…
Background; we have known for a long time that there is overdispersion of SARS-CoV-2 transmission; some estimates are that 20% of settings/events account for 80% of transmission. Understanding where these transmission events occur is important for non-pharmaceutical interventions
Furthermore, if we can be confident of spotting these individual small-scale super-spreading events and inform other individuals who are at risk of infection at the same time we can highlight people who are at the higher risk for infection, eg, asking them to get a test.
There are some straightforward 'stop using this term' aspects (the use of "Caucasian" for example); there are some complex "what does this term mean" (ethnicity labels, the ethnicity/race duality in US vs just ethnicity in UK / Europe) and then technical stuff on GWAS >>
The technical piece is about how we describe the common place GWAS protocol of subselecting a group people in cohorts for association analysis; a reminder that the standard process has two steps to achieve pseudo-randomisation of non-genetic factors to genetic factors