Okay #epitwitter and #genepitwitter, let’s talk about how statistical and biological gene-environment interactions relate to each other (or not). \thread (part 1)
TL;DR 1: the distribution of a trait conditional on genotype and exposure at the population level (whether there is a statistical interaction or not) is consistent with 1,000s of possible biological models.
TL;DR 2: conversely, knowing that a gene product and an exposure or exposure byproduct physically interact at the molecular or cellular level need not say anything about what’s happening at the population level.
Hence: "The elucidation of biological interactions by means of statistical models requires the imaginative and prudent use of inductive and deductive reasoning: it cannot be done mechanically." Siemiatycki and Thomas (1981) Int J Epidemiol.
I’ll discuss definitions and intuitions and implications in a sec—but first, I want to recommend two papers that helped clarify my thinking on this. They are well worth the read. PMIDs 1999681 7327838.
Part of the confusion around "gene environment interactions" stems from using the same words to describe different phenomenon. It's important to define what we mean by statistical and biological interactions.
Statistical interactions are easier to define, because we can write them down in maths.
Statistical interaction refers to effect measure modification of one factor by another--if the measure of the effect of exposure differs by genotype, we say there is a gene-environment interaction.
The challenge here is there are many effect measures or scales that we could use--e.g. for binary traits: odds ratios, risk differences, etc. Interaction on one scale need not imply interaction on another, and absence of interaction on one scale need not imply on others.
Consider a simple example, with a binary genotype and binary exposure. These two factors define four risk strata, and you can parameterize the four disease probabilities in many ways--here I'm showing the absolute risk and log odds scales (i.e. linear versus logit link in a GLM).
If the risk difference comparing exposed to non-exposed individuals differs by genotype (i.e. bge != 0), then we typically call that an "additive interaction" (or more verbosely: departures from additivity on the absolute risk scale).
If the odds ratio comparing exposed to non-exposed individuals differs by genotype (i.e. betage != 0), then we typically call that a "multiplicative interaction" (or more verbosely: departures from additivity on the log odds scale).
It's pretty common for folks to test for "interaction" by running a logistic regression and testing H0: betage=0. If this is non-significant, they sadly declare “no interaction” and move on.
But it’s important to realize that lack of interaction on a multiplicative scale often implies interaction on the additive scale—which can have clinical or public health implications.
In this hypothetical case, the risk reduction from removing exposure is higher among carriers than non-carriers. If all you did was run your logistic regression to test for “interaction,” you’d miss that.
Okay, that’s statistical interaction. What about biological interaction?
This is harder to define. Most people think of this as a mechanistic interaction, where the gene product and the exposure or some byproduct physically touch.
For example, folks who have a defective copy of the PAH gene cannot metabolize phenylalanine, which leads to phenylketonuria.
Another example: variants in the ADH and ALDH genes that change the efficiency of enzymes metabolizing ethanol.
(Side note: some people refer to “sufficient component cause” or “counterfactual” interactions as “biological interactions.” I do not, since most people think of mechanistic interactions when you say “biological interaction.”)
What does biological interaction have to do with statistical interaction? Hard to say, according to the Thompson and Siemiatycki & Thomas articles cited above.
The intuition behind their arguments? Even if you know something about the underlying biology--say, that the rate an exogenous exposure is detoxified differs by genotype--this need not induce a statistical interaction at the level of trait distribution in the population.
The shapes of the functions relating exogenous exposure to biologically active levels and the function relating active levels to traits are typically unknown. I may know something about the shape of g(G,X), but h(g(G,f(E)))=l(G,E) is another matter.
This does not mean biological interaction will never produce statistical interaction. You see it with phenylketonuria. You see it with alcohol, ADH/ALDH and esophageal cancer.
So if you already know how a variant changes gene function, and you know how that change will affect the exposure’s impact, you can hypothesize about what pattern you expect to see at the population level (i.e. decide what scale is relevant). To be continued...
As a genetic epidemiologist who has helped develop polygenic risk scores (PRS) for common, multifactoral diseases, I have... thoughts on the launch of a company to provide pre-pregnancy counseling & preimplantation screening based on these PRS. 🧵orchidhealth.com
1) The science doesn’t add up. 2) There are better ways to ensure your (and other) children have healthy and happy lives. 3) The message it sends is… not good.
1) The science doesn’t add up: Exhibit A. We have established the association between these PRS and common, complex, treatable and preventable late-onset diseases like breast and prostate cancer and type 2 diabetes, but…
Pancreatic cancer is a devastating disease: the average five-year survival is 9%, largely because most cancers are detected are advanced or metastatic, and cannot be removed surgically. 2/x
Identifying tumors early, when they can be treated, could improve survival. However, because pancreatic cancer is rare and current screening modalities are not 100% specific, general population screening is not recommended: too many false positives. 3/x uspreventiveservicestaskforce.org/uspstf/recomme…
I want to give more specifics about my main concerns with this pre-print on polygenic scores (PGS) in carriers of rare high-risk variants (e.g. pathogenic mutations in BRCA1). /THREAD
I (and many others) have been enthusiastic about the idea for years, but if we are to move from promise to reality we need to be careful.
1) This figure summarizes the main result in the paper: PGS are associated with risk among carriers of pathogenic variants—in fact the gradient of risk described by PGS is larger in carriers than non-carriers. HOWEVER…
If you know something about biology you can hypothesize about the form of the statistical interaction you expect to see—but (a) you need to know a lot already to do this, and (b) not seeing the expected pattern does not automatically mean your hypothesized mechanism is wrong.
You could also use what you (think you) know about biological mechanism to improve estimation precision when your sample size is small—e.g. if you are interested in genotype-specific treatment effects.
The recent Ganna et al. paper on same-sex sexual behavior has prompted questions about rg, the genetic correlation between two traits. What is it? How is it estimated? A technical primer.
rg is defined using the SNP-specific per-allele effects on each of two traits: b1i and b2i. (Yes, “effect” is a loaded term—I’ll come back to that. Roll with me for a sec.)
We can think of b1i as the regression coefficient from a multivariable ordinary least squares regression of Y1 on SNP i in an infinite sample from the population of interest. It’s the effect of SNP i adjusted for all the other SNPs. Ditto b2i.
I found the discussion of Quetelet and populations at the beginning of Chapter 3 of Epidemiology and the People’s Health particularly instructive, especially for genetic epidemiology. #epibookclub#epipeopleshealth#gwas#genepitwitter 1/18
The question “who—or what—determines populations or groups that merit comparison” is an important but tetchy one. 2/18
The concept of “population stratification bias” in genetic epidemiology is usually introduced using a toy example: say we’re studying two populations, with random mating within but no mating across populations. 3/18