Another way of looking at the relationship between protein QTLs and the genes encoding the protein.
For the primary cis-pQTLs this coding gene is usually the closest gene (blue wedge). If not the closest gene, it's usually the second closest gene (orange wedge).
Interestingly, this is not completely true for the secondary cis-pQTL signals. Here the coding gene is more likely to be in first position than any other position but it's only true for 30 of 99 secondary signals.
I'd be interested in people's thoughts on this.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
About half the genes in the diagram (the ones with a 7) are also involved in closely related monogenic diseases. This is generally a reliable way to identify a true causal gene.
I looked across all the loci at all genes involved in "rare cardiac diseases" orpha.net/consor/cgi-bin…
First up are genes involved in depolarization and repolarization of the heart. These are all previously known loci, but fall into that nice category of closest gene and also rare disease gene that makes them highly likely to be causal (ok: SCN5A/SCN10A is a special case)
Here's how I see the SNP->gene gold standard issue.
This map separates the problem of identifying the causal transcript for a disease from the issue of identifying which transcripts are altered by a SNP.
As we know from, eg, lactase, many mRNAs are altered but only 1 is causal.
The map acknowledges that a GWAS association is probably acting through a functional variant that impacts a transcript that (usually) impacts a protein that may alter a biomarker or intermediate phenotype which manifests as a change in disease risk or complex phenotype.
From left to right:
cis-eQTLs and splicing-QTLs reveal mechanisms by which a DNA variant can impact mRNA abundance. It's good to model and predict these.
At a particular locus these may or may not translate into elucidation of the causal transcript for the disease phenotype.