Tweet

Ewan Birney

5 Aug, 26 tweets, 5 min read

Had another moment of "well, yes, but people *are* different" and "you geneticists use continental groups in your analysis" as we skirted around discussions of ethnicity / race in health impacts. TL;DR Partially correct but the underlying mindset that ethnicity=genetics is wrong

Let's deal with the correct things first. Yes, people are different partly (sometimes mainly) due to genetics. Visibly, eg height, weight, hair colour, skin colour, smoking habits + invisibly, eg cholesterol levels, heart trabeculation levels, likelihood of getting breast cancer

Some of these visible differences we integrate into the gestalt assessment of ourselves and others for ethnicity, as represented by self identified ethnicity boxes which people tick, eg "Black British, White English, British Indian, British xxx", gloriously variable by society

Also correct is that geneticists have (relatively) complex procedures in analysis which confusingly often use a short hand of the results with similar sounding language to ethnicity labels, eg, "European American" +non-random correlation to self identified ethnicity

The most "classic" use is the selection of subsets of cohorts of people where geneticists use wide genetic relatedness to select a group of people such that (a) that their environment, most importantly social (b) that genetic properties (eg, LD) are reasonably homogenous

We do this by using a longstanding statistical approach to analyse variation (Principal component analysis) to project the high dimensional genetics into a lower space +often use the first 2 dimensions (easy to visualise) to crudely selected people (draw a circle around the blob)

It is worth stressing there is nothing magical about PCA, and each PCA is unique to the *samples*. The "central blob" being European American in US cohorts is due to the sampling in the US; in Japanese cohorts the central blob is "Japanese" etc.

Although the selection of the biggest blob in the 2-D PCA nearly always associates with a self identified ethnicity (eg. "White" in UK BioBank) *plenty* of people who tick that box don't make it into the blob. This is not because they "got it wrong" or have some family secret ...

This is because we want to be conservative in the sub-selection of people to meet some statistical criteria - we're not trying to describe ancestry in the population genetics use of that term (PCA is not a good representation of this) nor predict ethnicity.

Focusing on the social environment component first, because I think it is most important, this is leveraging the fact that recent ancestry is a good marker for social environment - eg, crudely Black British people experience a different social environment to White British people

(it's weird, but we're using genetics to "predict" social environment features here)

What we're aiming to achieve is twofold - the most important is getting to pseudorandomisation of the environment with respect to genetics - if this holds, our downstream GWAS will hold. The second is limiting variation from environment (easier to spot the genetic variation).

It is also useful we're selecting for a relatively homogenous and random genetics - for example that linkage disequilibrium patterns of SNPs are well modelled as coming from one distribution.

Frustratingly we (as geneticists) don't study much this procedure about *why* it works and what precisely it is modelling (or rather, the properties of the people not selected vs selected) - partly because this walks you into some messy societal things, often not measured, also>>

because by definition the subselection is the majority of the people and the people left out of the blob are hetreogenous probably both in their social environment as well as their genetics - the former being the big headache.

@aylwyn_scally

As we (@aylwyn_scally @AdamRutherford @JenniferRaff @minouye271 ) have outlined in this preprint (arxiv.org/abs/2106.10041) we think geneticist's use of language in this setting and other language usage is confusing and can cause harm.

It is clunky, but we suggest using more technical precise terms (eg, first usage, "The European-associated PCA
cluster, which aims to minimise variation in non-genetic factors and genetic factors" ; repeated usage "PCA-selected European subset"

We welcome feedback from genetics and non-genetics colleagues on how best to craft our language to explain what we're doing.

Back to my conversation - given geneticists do these sorts of subselections in their analysis, are we not saying there is *something* in the genetics of continental groupings / continental ancestry?

No, we're not. We're using (recent) continental ancestry to predict the *social environment* that people experience in our societies, and making using that the genetics we do our association studies with are randomised to these large societal differences.

It also gives us smoother LD properties and some other nice features, but for me this is secondary to the "elephant in the room" which is social stratification.

(I can sense some of my genetics colleagues now saying 'well what about genetic background, from "real" epistasis through to background linear polygenic scores? Surely *some* of this PCA selection/PCA covariates are modelling this? >>

This deserves its own thread, but the simple answer; given the elephant in the room of social stratification we have to be conservative, and accept we will "model out" some "real" genetic background.

In fact, I think in human genetics there is far less "genetic background" a la laboratory mouse/maize land races/farm animal genetics but this needs more words than a tweet!)

So.... Ethnicity is associated with a little bit genetics inherently (visible characterisitics) and depressingly alot of recent ancestry *via* the social stratification that associated with ethnicity, but this doesn't make ethnicity a crude proxy for genetics.

In both research (basic+clinical) and clinical practice we can't be "colour blind" - human society nearly everywhere is too stratified impacting all sorts of things via many routes - but *genetics* is always best done by measuring *DNA* not self identified ethnicity box ticking

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ewanbirney

Ewan Birney

@ewanbirney

4 Aug

Ah. I love the smell of freshly baked data/analysis, well controlled false discovery rate (QQ plot) and just ... so many results. Which of the thousands of beautiful stars in the sky does one pull out to discuss? Biology is so endless and wonderful in its detail...

... to alter (butcher?) a passage from a far far wiser and more thoughtful man than me....

It is interesting to contemplate a tangled set of genetic results, associated to both well known genes and entirely anonymous regions of the genome, stories from physiology of old and hints of new insights, and to reflect ...

Read 5 tweets

Ewan Birney

@ewanbirney

4 Aug

https://twitter.com/tomstandage/status/1422464071312551956

Great history of the electric car / mobility - 1890s onwards. I really like switching sometimes to a historical perspective on science and technology; it reminds one of the unchanging nature of human foibles and drivers with "you know how the technology story turns out"

https://twitter.com/tomstandage/status/1422464071312551956

There is, for me, a similar history of technology / medicine about the complex introduction of Xrays into medicine (I blogged about this 6 years - (! 6 years!) ago - ewanbirney.com/2015/10/genomi…

The journey of Xrays from spanking new whizzy technology to routine part of medicine is surprisingly complex - it involves twists and turns, inappropriate use of technology "just for fun" (echos of 23andme), and non obvious advocates for the uptake of the technology.

Read 4 tweets

Ewan Birney

@ewanbirney

3 Aug

A COVID perspective: TL;DR - the pandemic in the developed world has shifted due to successful vaccines, though plenty of complex and tricky scenarios to navigate; the developed world is in the midst of even harsher transmission rate from Delta.

Context: I am an expert in human genetics and bioinformatics. I know experts in viral genomics, infectious epidemiology, public health, clinical trials and immunology. I have some COIs: I am longstanding consultant to Oxford Nanopore (sequencing company) and am on the Ox/AZ trial

With the perspective of a glorious holiday in Northumberland, the last week disconnected from work and twitter, I have some bigger picture musings on the pandemic from my perspective.

Read 29 tweets

Ewan Birney

@ewanbirney

22 Jul

@DeepMind

A personal view point on the #AlphaFold announcement today from the @DeepMind and @emblebi team, part of @embl. TL;DR - I am *still* pinching myself about this.

@demishassabis

When @demishassabis and the AlphaFold team first presented the results from CASP to me last November I genuinely almost fell off my chair. I think I swore quite a bit (in a British way) in amazement.

One of the reasons was I knew how rigorous CASP was - 20 years ago people published all sorts of "solving the folding problem" which then... didn't work beyond the training set. CASP cleverly used the fact that there are genuinely unknown structures each year solved by experiment

Read 15 tweets

Ewan Birney

@ewanbirney

28 Jun

@PHE_uk

*trumpets* A new preprint by colleagues in @PHE_uk from @isaperena's group and myself (my first infectious epidemiology paper!) on single source transmission of COVID19 using viral genotyping to understand relative risk of transmission settings. papers.ssrn.com/sol3/papers.cf…

Background; we have known for a long time that there is overdispersion of SARS-CoV-2 transmission; some estimates are that 20% of settings/events account for 80% of transmission. Understanding where these transmission events occur is important for non-pharmaceutical interventions

Furthermore, if we can be confident of spotting these individual small-scale super-spreading events and inform other individuals who are at risk of infection at the same time we can highlight people who are at the higher risk for infection, eg, asking them to get a test.

Read 23 tweets

Ewan Birney

@ewanbirney

21 Jun

@minouye271

A group of us (@minouye271, @JenniferRaff, @aylwyn_scally @AdamRutherford and myself) have written a piece on the language we use in genetics; untangling from previous sometimes racist language and being more precise and less harmful. We welcome feedback. arxiv.org/abs/2106.10041

There are some straightforward 'stop using this term' aspects (the use of "Caucasian" for example); there are some complex "what does this term mean" (ethnicity labels, the ethnicity/race duality in US vs just ethnicity in UK / Europe) and then technical stuff on GWAS >>

The technical piece is about how we describe the common place GWAS protocol of subselecting a group people in cohorts for association analysis; a reminder that the standard process has two steps to achieve pseudo-randomisation of non-genetic factors to genetic factors

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Ewan Birney

Try unrolling a thread yourself!

More from @ewanbirney

Ewan Birney

Ewan Birney

Ewan Birney

Ewan Birney

Ewan Birney

Ewan Birney

Did Thread Reader help you today?

Like this author's thread?