Pete Kraft Profile picture
Dec 13, 2019 27 tweets 7 min read Read on X
Okay #epitwitter and #genepitwitter, let’s talk about how statistical and biological gene-environment interactions relate to each other (or not). \thread (part 1)
TL;DR 1: the distribution of a trait conditional on genotype and exposure at the population level (whether there is a statistical interaction or not) is consistent with 1,000s of possible biological models.
TL;DR 2: conversely, knowing that a gene product and an exposure or exposure byproduct physically interact at the molecular or cellular level need not say anything about what’s happening at the population level.
Hence: "The elucidation of biological interactions by means of statistical models requires the imaginative and prudent use of inductive and deductive reasoning: it cannot be done mechanically." Siemiatycki and Thomas (1981) Int J Epidemiol.
I’ll discuss definitions and intuitions and implications in a sec—but first, I want to recommend two papers that helped clarify my thinking on this. They are well worth the read. PMIDs 1999681 7327838.
Part of the confusion around "gene environment interactions" stems from using the same words to describe different phenomenon. It's important to define what we mean by statistical and biological interactions.
Statistical interactions are easier to define, because we can write them down in maths.
Statistical interaction refers to effect measure modification of one factor by another--if the measure of the effect of exposure differs by genotype, we say there is a gene-environment interaction.
The challenge here is there are many effect measures or scales that we could use--e.g. for binary traits: odds ratios, risk differences, etc. Interaction on one scale need not imply interaction on another, and absence of interaction on one scale need not imply on others.
Consider a simple example, with a binary genotype and binary exposure. These two factors define four risk strata, and you can parameterize the four disease probabilities in many ways--here I'm showing the absolute risk and log odds scales (i.e. linear versus logit link in a GLM).
If the risk difference comparing exposed to non-exposed individuals differs by genotype (i.e. bge != 0), then we typically call that an "additive interaction" (or more verbosely: departures from additivity on the absolute risk scale).
If the odds ratio comparing exposed to non-exposed individuals differs by genotype (i.e. betage != 0), then we typically call that a "multiplicative interaction" (or more verbosely: departures from additivity on the log odds scale).
It's pretty common for folks to test for "interaction" by running a logistic regression and testing H0: betage=0. If this is non-significant, they sadly declare “no interaction” and move on.
But it’s important to realize that lack of interaction on a multiplicative scale often implies interaction on the additive scale—which can have clinical or public health implications.
In this hypothetical case, the risk reduction from removing exposure is higher among carriers than non-carriers. If all you did was run your logistic regression to test for “interaction,” you’d miss that.
Okay, that’s statistical interaction. What about biological interaction?
This is harder to define. Most people think of this as a mechanistic interaction, where the gene product and the exposure or some byproduct physically touch.
For example, folks who have a defective copy of the PAH gene cannot metabolize phenylalanine, which leads to phenylketonuria.
Another example: variants in the ADH and ALDH genes that change the efficiency of enzymes metabolizing ethanol.
(Side note: some people refer to “sufficient component cause” or “counterfactual” interactions as “biological interactions.” I do not, since most people think of mechanistic interactions when you say “biological interaction.”)
What does biological interaction have to do with statistical interaction? Hard to say, according to the Thompson and Siemiatycki & Thomas articles cited above.
The intuition behind their arguments? Even if you know something about the underlying biology--say, that the rate an exogenous exposure is detoxified differs by genotype--this need not induce a statistical interaction at the level of trait distribution in the population.
The shapes of the functions relating exogenous exposure to biologically active levels and the function relating active levels to traits are typically unknown. I may know something about the shape of g(G,X), but h(g(G,f(E)))=l(G,E) is another matter.
This does not mean biological interaction will never produce statistical interaction. You see it with phenylketonuria. You see it with alcohol, ADH/ALDH and esophageal cancer.
So if you already know how a variant changes gene function, and you know how that change will affect the exposure’s impact, you can hypothesize about what pattern you expect to see at the population level (i.e. decide what scale is relevant). To be continued...
Also, here’s the breakdown of odds of esophageal cancer, broken down by ADH/ALDH genotype and alcohol intake.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Pete Kraft

Pete Kraft Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @GENES_PK

Apr 9, 2021
As a genetic epidemiologist who has helped develop polygenic risk scores (PRS) for common, multifactoral diseases, I have... thoughts on the launch of a company to provide pre-pregnancy counseling & preimplantation screening based on these PRS. 🧵orchidhealth.com
1) The science doesn’t add up. 2) There are better ways to ensure your (and other) children have healthy and happy lives. 3) The message it sends is… not good.
1) The science doesn’t add up: Exhibit A. We have established the association between these PRS and common, complex, treatable and preventable late-onset diseases like breast and prostate cancer and type 2 diabetes, but…
Read 13 tweets
Apr 22, 2020
Our paper developing a multi-‘omic risk model for pancreatic cancer is now out. @CEBP_AACR 1/x cebp.aacrjournals.org/content/early/…
Pancreatic cancer is a devastating disease: the average five-year survival is 9%, largely because most cancers are detected are advanced or metastatic, and cannot be removed surgically. 2/x
Identifying tumors early, when they can be treated, could improve survival. However, because pancreatic cancer is rare and current screening modalities are not 100% specific, general population screening is not recommended: too many false positives. 3/x uspreventiveservicestaskforce.org/uspstf/recomme…
Read 12 tweets
Jan 17, 2020
I want to give more specifics about my main concerns with this pre-print on polygenic scores (PGS) in carriers of rare high-risk variants (e.g. pathogenic mutations in BRCA1). /THREAD
I (and many others) have been enthusiastic about the idea for years, but if we are to move from promise to reality we need to be careful.
1) This figure summarizes the main result in the paper: PGS are associated with risk among carriers of pathogenic variants—in fact the gradient of risk described by PGS is larger in carriers than non-carriers. HOWEVER…
Read 19 tweets
Dec 13, 2019
Biological and statistical interactions \thread part 2 (for part 1 see quoted tweet)

If you know something about biology you can hypothesize about the form of the statistical interaction you expect to see—but (a) you need to know a lot already to do this, and (b) not seeing the expected pattern does not automatically mean your hypothesized mechanism is wrong.
You could also use what you (think you) know about biological mechanism to improve estimation precision when your sample size is small—e.g. if you are interested in genotype-specific treatment effects.
Read 13 tweets
Aug 30, 2019
The recent Ganna et al. paper on same-sex sexual behavior has prompted questions about rg, the genetic correlation between two traits. What is it? How is it estimated? A technical primer.
rg is defined using the SNP-specific per-allele effects on each of two traits: b1i and b2i. (Yes, “effect” is a loaded term—I’ll come back to that. Roll with me for a sec.)
We can think of b1i as the regression coefficient from a multivariable ordinary least squares regression of Y1 on SNP i in an infinite sample from the population of interest. It’s the effect of SNP i adjusted for all the other SNPs. Ditto b2i.
Read 23 tweets
Jul 28, 2019
I found the discussion of Quetelet and populations at the beginning of Chapter 3 of Epidemiology and the People’s Health particularly instructive, especially for genetic epidemiology. #epibookclub #epipeopleshealth #gwas #genepitwitter 1/18
The question “who—or what—determines populations or groups that merit comparison” is an important but tetchy one. 2/18
The concept of “population stratification bias” in genetic epidemiology is usually introduced using a toy example: say we’re studying two populations, with random mating within but no mating across populations. 3/18
Read 21 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(