Having this enormous collection of pQTLs allows us to answer the question (again):
Which is more relevant:
Distance of a GWAS SNP to the TSS (transcription start site) or to the gene body of a candidate gene? pubmed.ncbi.nlm.nih.gov/34648354/
Usually you get the same closest gene measuring to TSS or to gene body.
But in the top case a pQTL for ACAA1 sits inside an irrelevant gene but is closer to the TSS for ACAA1.
But a pQTL for DNAJC17 sits closer to a TSS for a random gene despite sitting within DNAJC17.
Turns out if TSS_closest_gene and gene_body_closest_gene disagree, the gene body metric is right twice as often
This is especially true if the SNP sits within a gene, even if it is not a missense variant
(Again though, usually TSS and gene_body agree on the closest gene (77%))
This is important because MOST GWAS SNPs SIT WITHIN A GENE. That is, even though most GWAS SNPs are non-coding, they are not intergenic.
This table counts how often each "context" term is used in the @GWASCatalog
where p<5e-8 and a context is provided.
Combining @GWASCatalog tally with pQTL data from @pietznerm et al:
* Most GWAS SNPs (all traits) sit within a gene; that gene is ~70% likely to be the causal gene (pQTL data)
* Intergenic SNPs are only 27% of GWAS SNPs; here closest gene is causal 59%
* expected weighted avg =68%
• • •
Missing some Tweet in this thread? You can try to
force a refresh
When protein abundance is the trait, the simplest assumption is that the gene encoding the protein is the causal gene.
This catalog of 10,674 pQTLs from @pietznerm et al provides a rare unbiased look at GWAS SNP->causal gene genomic properties.
I took a quick look at the SNP-gene distances for all cases where the lead SNP had an rsID and the trait had a unique HGNC gene symbol. 3,475 cases SNP and cognate gene are on the same chromosome, 2,985 times within 500kb, with a very strong distance dependence.
For this study the authors provided the VEP consequence for each pQTL so we can look how often the cognate gene is the closest gene as a function of that consequence
Even ignoring missense variants, if the variant falls within a gene that's a strong predictor.
In mapping SNPs to genes we clearly can do better than taking the closest gene, but that should be the baseline by which we compare other methods. @cr_farber et al, I hope you'll consider this before submitting this for publication.
In this preprint the authors started with 1,097 lead SNPs for bone mineral density from pubmed.ncbi.nlm.nih.gov/30598549/ and applied TWAS and eQTL colocalization to identify "potentially causal genes"
To validate the approach the authors constructed a list of 1,399 "known bone" genes and noted enrichment of their TWAS/eQTL selected genes.
But the enrichment for "known bone" genes is much greater for genes closest to the lead SNPs.
A well-behaved GWAS yields strong signals for the kinds of genes that contribute to the phenotypic variation.
This provides strong priors for discerning likely causal genes hidden at other loci.
With this in mind, let's revisit the telomere GWAS
Today's GWAS of urolithiasis, kidney stones and other stones of the urinary tract, provides a wonderful window into calcium, phosphate and vitamin D metabolism.
One nice thing about putting my GWAS interpretations here in Twitter is I can always quickly find what I may have written about a gene or a trait before.
Here's my write up on urolithiasis from 2 years ago in a completely different cohort, biobank japan
On the left is the top hits from @finngen; on the right the top hits from Biobank Japan.
5 of 6 loci from FinnGen also found in BBJ.
Note some of the lead SNPs may differ, but the causal genes line up.
Welcome all! We've added several hundred followers over the past few weeks, so as a quick intro, I use this account mainly to explore interesting issues related to the biological interpretation of GWAS @SbotGwa from @andganna provides me with a steady diet of interesting material
@SbotGwa alternates between GWAS from @uk_biobank and @FinnGen_FI.
Yesterday's Manhattan plot from FinnGen yielded a single hit for the trait "other and unspecified corneal deformities and disorders"
Let's dive in,
I often say it's good to take the most significant association at a locus as we start to interpret it. @FinnGen_FI uses a modified PheWeb server to show results.
Here is the PheWAS for this SNP.
Top association is Keratitis, inflammation of the cornea.
Thanks for the shout out, and welcome any new followers.
I like looking at GWAS and trying to decipher the causal biology behind the hits.
I use this account to highlight interesting results and provide links to the tools and approaches I find most useful.
One theme I come back to is that because the closest gene is usually the correct causal gene any analysis of a new GWAS should start there.
Here's a story from almost a year ago, a great study on heart trabeculae that initially ignored the closest genes
Fortunately this paper was on biorxiv. After I tweeted about the omissions and before it was published in Nature, the paper was amended to include 2 of the prominent heart structure genes: pubmed.ncbi.nlm.nih.gov/32814899/