While it is true that the gene closest to a GWAS peak is not always the causal gene, it is also true that it usually is.
In fact, we can quantify how often we should expect the causal gene to be the closest gene, and that number is about 70%
3 papers from 2021 help pin this down:
Activity-by-contact (ABC-Max) predicts a causal gene for a GWAS SNP using a combination of cell-type specific chromatin accessibility, epigenome marks and chromatin conformation, which can also be estimated by SNP-TSS distance: pubmed.ncbi.nlm.nih.gov/33828297/
There were several large pQTL studies published in 2021. I've been referencing this one by @pietzner et al. When protein abundance is the trait, the hypothesis is the cognate gene (the one encoding the protein) is the causal gene: pubmed.ncbi.nlm.nih.gov/34648354/
These papers provide 3 independent approaches to quantifying the distribution of ordinal rank for the causal gene from a lead GWAS SNP
Here I'm defining distance to the "gene body" (TSS-TES)
At least in ABCmax the lead variant has been fine-mapped.
closest gene: 70%-76%
When protein abundance is the trait, the simplest assumption is that the gene encoding the protein is the causal gene.
This catalog of 10,674 pQTLs from @pietznerm et al provides a rare unbiased look at GWAS SNP->causal gene genomic properties.
I took a quick look at the SNP-gene distances for all cases where the lead SNP had an rsID and the trait had a unique HGNC gene symbol. 3,475 cases SNP and cognate gene are on the same chromosome, 2,985 times within 500kb, with a very strong distance dependence.
For this study the authors provided the VEP consequence for each pQTL so we can look how often the cognate gene is the closest gene as a function of that consequence
Even ignoring missense variants, if the variant falls within a gene that's a strong predictor.
In mapping SNPs to genes we clearly can do better than taking the closest gene, but that should be the baseline by which we compare other methods. @cr_farber et al, I hope you'll consider this before submitting this for publication.
In this preprint the authors started with 1,097 lead SNPs for bone mineral density from pubmed.ncbi.nlm.nih.gov/30598549/ and applied TWAS and eQTL colocalization to identify "potentially causal genes"
To validate the approach the authors constructed a list of 1,399 "known bone" genes and noted enrichment of their TWAS/eQTL selected genes.
But the enrichment for "known bone" genes is much greater for genes closest to the lead SNPs.
A well-behaved GWAS yields strong signals for the kinds of genes that contribute to the phenotypic variation.
This provides strong priors for discerning likely causal genes hidden at other loci.
With this in mind, let's revisit the telomere GWAS
Today's GWAS of urolithiasis, kidney stones and other stones of the urinary tract, provides a wonderful window into calcium, phosphate and vitamin D metabolism.
One nice thing about putting my GWAS interpretations here in Twitter is I can always quickly find what I may have written about a gene or a trait before.
Here's my write up on urolithiasis from 2 years ago in a completely different cohort, biobank japan
On the left is the top hits from @finngen; on the right the top hits from Biobank Japan.
5 of 6 loci from FinnGen also found in BBJ.
Note some of the lead SNPs may differ, but the causal genes line up.