I love the idea of this GWAS. The authors estimated the abundance of mtDNA in the blood of @uk_biobank participants by using the intensities of probes mapping to the mito genome
@HaggSara
Juulia Jylhävä
Yunzhang Wang
Kamila Czene &
Felix Grassmann
There are many analyses in this paper; I'm (naturally) only focusing on the GWAS which identified 66 lead SNPs in 50 loci.
I think I found 4 likely causal genes not mentioned in the paper including an explanation for the strongest hit and largest beta out there on Chr 19.
As a first step the authors used FUMA to select a functional variant for each lead SNP, picked the closest gene to each SNP and looked for shared themes among those genes. This found 3 plausible themes: 1) immune related, 2) cancer & cell cycle and 3) mitochondrial function
A quick segue before we get to the new mitochondrial genes.
As revealed above, many of the genes in this GWAS relate to immune cell abundance, I assume because WBCs have mitochondria while RBCs do not.
Relevant genes include MYB near HBS1L for RBC count & JMJD1C for WBCs
Ok - let's focus on the mitochondrial signals.
When I explore the closest genes I see convergence for
NR1D1 encodes Rev-erbα, a nuclear hormone receptor involved in regulation of cell cycle.
A degradation resistant mutant of NR1D1 results in increased mitochondrial abundance. ncbi.nlm.nih.gov/pmc/articles/P…
Not sure why this gene wasn't mentioned
For step 2 we use the fact that the closest gene is usually correct, so we can use the themes from the closest genes to find candidate causal genes that are further from the lead SNP.
~50 genes are included in the reactome or GO sets above.
Are any of these near SNPs?
Just 20kb from rs741735 we find SIRT3. SIRT3 is a mitochondrial-specific deacetylase. SIRT3 works with FOXO3 to regulate mitochondrial biogenesis:
A novel AMPK-dependent FoxO3A-SIRT3 intramitochondrial complex sensing glucose levels
Because of the unusual localization pattern, reminiscent of twinkling stars, we designate the full-length protein Twinkle
(for *T*7 gp4-like protein *w*ith *i*ntramitochondrial *n*ucleoid *l*ocalization).
So we still have that signal with a p-value of 3e-62 on Chr19, locus 43 which actually comprises 4 independent variants covering a 200 kb window.
rs806709 is 74kb from LONP1 (rs566777150, p=1e-18 is only 20kb from LONP1).
Why is LONP1 annotated with GOCC: mitochondrial nucleoid?
LONP1 encodes a mitochondrial protease which regulates numerous mitochondrial functions:
1) it degrades damaged proteins 2) it degrades specific oxphos components 3) it regulates mtDNA replication by degrading the phosphorylated form of TFAM
Interestingly that lead GWAS SNP is in high LD (r2 0.8-0.9) to a missense variant in LONP1: Arg241Gln. No information on whether that variant impacts activity of the protease. ldlink.nci.nih.gov/?var=rs806709&…
LONP1 is responsible for a severe Mendelian disease, CODAS syndrome:
Thanks for the shout out, and welcome any new followers.
I like looking at GWAS and trying to decipher the causal biology behind the hits.
I use this account to highlight interesting results and provide links to the tools and approaches I find most useful.
One theme I come back to is that because the closest gene is usually the correct causal gene any analysis of a new GWAS should start there.
Here's a story from almost a year ago, a great study on heart trabeculae that initially ignored the closest genes
Fortunately this paper was on biorxiv. After I tweeted about the omissions and before it was published in Nature, the paper was amended to include 2 of the prominent heart structure genes: pubmed.ncbi.nlm.nih.gov/32814899/
I count 1,199 lead SNPs in this Manhattan plot! Nothing specifically special about leg fat free mass; this trait is highly correlated with other body size traits: ukbb-rg.hail.is/rg_summary_231…
About half the genes in the diagram (the ones with a 7) are also involved in closely related monogenic diseases. This is generally a reliable way to identify a true causal gene.
I looked across all the loci at all genes involved in "rare cardiac diseases" orpha.net/consor/cgi-bin…
First up are genes involved in depolarization and repolarization of the heart. These are all previously known loci, but fall into that nice category of closest gene and also rare disease gene that makes them highly likely to be causal (ok: SCN5A/SCN10A is a special case)
Here's how I see the SNP->gene gold standard issue.
This map separates the problem of identifying the causal transcript for a disease from the issue of identifying which transcripts are altered by a SNP.
As we know from, eg, lactase, many mRNAs are altered but only 1 is causal.
The map acknowledges that a GWAS association is probably acting through a functional variant that impacts a transcript that (usually) impacts a protein that may alter a biomarker or intermediate phenotype which manifests as a change in disease risk or complex phenotype.
From left to right:
cis-eQTLs and splicing-QTLs reveal mechanisms by which a DNA variant can impact mRNA abundance. It's good to model and predict these.
At a particular locus these may or may not translate into elucidation of the causal transcript for the disease phenotype.