Happy to have been a part of this METSIM and @FinnGen_FI effort combining metabolomics, transcriptomics and disease traits.
Among many other results we see again lessons for appropriate and inappropriate ways to interpret eQTLs. pubmed.ncbi.nlm.nih.gov/36055244/
We confirm (again) that you cannot use eQTLs to identify, select or prioritize the true causal gene. As in 2020 paper by Ndungu & @markmccarthyoxf we find an 8% precision using TWAS alone
On average TWAS will flag 11 wrong genes for every 1 correct gene. pubmed.ncbi.nlm.nih.gov/31978332/
You can improve the situation by including colocalization and a probabilistic framework (P-TWAS), which in our hands brings you to 37% precision - only 2 wrong genes for every 1 correct gene.
Manhattan plot and QQ plot show at least one robust signal, on chromosome 3, with a lead SNP at rs34951015
Who's that causal gene, you say? ncbi.nlm.nih.gov/pmc/articles/P…
Another fantastic gene story from the METSIM metabolomics GWAS, now available in Nature Communications rdcu.be/cJYVY
The trait is “carotene diol”. The @Metabolon platform identifies 3 unique metabolites, but the GWAS reveals some consistent signals across these 3 molecules pheweb.org/metsim-metab/p…
Carotenes are long chain hydrocarbons produced by plants
Carotene diols have 2 extra hydroxyl groups
Lycophyll or lycopene gives the red color to tomatoes
Zeaxanthin gives the yellow color to corn
Folks who follow me on Twitter will have seen bits of this before, but with the help of my @pfizer colleague Craig Hyde we have now provided some mathematical structure to my observations about distances from GWAS lead SNPs to causal genes: biorxiv.org/content/10.110…
We started with the recent pQTL study from @pietznerm et al: pubmed.ncbi.nlm.nih.gov/34648354/
It is well known that the distance from lead SNP to cognate gene follows an approximate exponential decay:
But at distances > 10 Mb, SNP->gene distances don't follow an exponential decay and actually are perfectly described by the mathematics of picking 2 random points on a string:
While it is true that the gene closest to a GWAS peak is not always the causal gene, it is also true that it usually is.
In fact, we can quantify how often we should expect the causal gene to be the closest gene, and that number is about 70%
3 papers from 2021 help pin this down:
Activity-by-contact (ABC-Max) predicts a causal gene for a GWAS SNP using a combination of cell-type specific chromatin accessibility, epigenome marks and chromatin conformation, which can also be estimated by SNP-TSS distance: pubmed.ncbi.nlm.nih.gov/33828297/
There were several large pQTL studies published in 2021. I've been referencing this one by @pietzner et al. When protein abundance is the trait, the hypothesis is the cognate gene (the one encoding the protein) is the causal gene: pubmed.ncbi.nlm.nih.gov/34648354/