Our #COVID19 pan-ancestry exome-wide meta-analysis across 586,157 individuals (20,952 SARS-CoV-2 positive cases, 4928 hospitalized, 1304 severe) is now out in @AJHGNews.
What did we find? To be frank, not much, but here's a 🧵.
Within 3 cohorts with #COVID19 phenotypes (@uk_biobank, @GeisingerHealth, @PennMedicine), we analyzed rare variants (MAF<0.5%; ~7million) and burden tests within each ancestry separately and meta-analyzed our results. Incorporating non-EUR data increased our case N by ~10%.
For any given burden test or rare variant to be significant, we set a conservative Bonferroni significance threshold at 9.6e-10 given the number of traits & variants tested.
Here's Manhattan plots for 20,952 SARS-CoV-2 cases vs 565,205 controls with nothing remotely significant.
Focusing on 4928 hospitalized COVID19 cases sadly didn't yield anything Bonferroni significant here either.
In case you are curious, the top gene was MARK1 (P=2.9e-8) and the top individual RV was a splice region variant in WDR78 (P=2.8e-9).
Lastly, 1304 severe COVID19 cases (those on a ventilator or who had passed away) compared to 528,758 controls was similarly flat.
The sub-significant tower in the burden tests is TLR7 (P=4.3e-8), a gene first implicated by @ahoischen and co. in ja.ma/34OEDJT.
A full list of sub-significant variants and burden tests that would pass a standard genome-wide significance threshold of 5e-8 (which is really inappropriate for rare variants) can be found in Table 1. But please, treat these like you would any other non-significant variant!
Even had we focused on say 1) 281 genes with GWAS loci from @covid19_hgi, or 2) 32 COVID19 therapeutic-target genes or genes involved in SARS-CoV-2 (e.g., ACE2, TMPRSS2) we still wouldn't have found anything statistically convincing (Tables S6 and S8).
Lastly, we tried to replicate the other COVID19 exome studies such as the type 1 interferon signaling @ScienceMagazine paper from Zhang et al. 2020 that found a small increase of rare pLoFs in 13 interferon genes (P=0.01).
Despite having 8x cases and 1000x controls as the original paper, we couldn't replicate this association regardless of using 1) PTVs or PTVs+missense, 2) restricting to singletons, 3) only using severe cases (Table 2 and Table S7).
Our study is now the 2nd paper reporting an inability to replicate the biologically-attractive work of Zhang et al., @ScienceMagazine 2020. @GPovysil, @kirylukk, and co. also couldn't replicate it as seen in their recent paper jci.org/articles/view/…
A huge amount of work across all of @Regeneron went into this paper and couldn't have been done with the help from my fellow co-first author, Julie Horowitz, and leadership from Manuel Ferreira, @aris_baras, @marchini, @jgreid and so many others.
However, the real heroes of any exome paper are those working tirelessly on QC!
Huge props to @jdbackman and @marchini for building a SVM to perform some fantastic exome QC (in the same manner as used by @gnomad_project and TOPmed). You can trust exome data without it!
Lastly, a sensible meta-analysis of rare variants and burden tests wouldn't have been possible without REGENIE developed by Joelle Mbatchou and @marchini.
In case you missed it, I presented results on behalf of @Regeneron from a trans-ancestry #COVID19 meta-analysis of common and rare variants + gene burden tests in >883k imputed samples and >592k exomes. #ASHG20
Using REGENIE developed by @joellembatchou and @marchini (SAIGE gave us some bizarre results with rare variants) to run our common and rare variant GWASes, we found 2 loci associated with susceptibility and 3 loci with hospitalization. #ASHG20
Curiously, despite losing ~75% of the cases, we found more loci with hospitalization than using all COVID19 positive individuals - @covid19_hgi sees the same pattern.
One might suspect a severe COVID19 GWAS would be even more powerful at the same sample size #ASHG20
Up next in the first #ASHG20 plenary session is my former Daly lab colleague, @HHeyne, presenting work from @FinnGen_FI on "Recessive effects of 82,516 coding variants in 176,899 Finns."
Bottleneck in Finland makes the Finnish population and @FinnGen_FI ideal for understanding ultra-rare variants that stochastically rose in frequency #ASHG20
Using SAIGE, ran GWAS using 1) a recessive model and 2) additive model on all coding variants in @FinnGen_FI.
For the majority of coding variants, additive model performs better. However, the recessive model performed 2x better for some variants. #ASHG20
1st Plenary #ASHG20 session: Meredith Course identified a human-specific 69bp repeat expansion in the last exon of WDR7.
This repeat was associated with ALS (repeats in ALS cases are on average longer in ALS cases than controls). Longest repeat observed was 86bp. #ASHG20
Observed periodicity in the WDR7 repeat and found that it expands in multiples of 2 in the 3' - 5' direction. #ASHG20
Initial analysis was done in a EUR cohort, so repeated the analysis in non-EUR cohorts, finding population specific repeats in Han Chinese descent and AFR ancestry #ASHG20
Up next is @joellembatchou from @Regeneron presenting REGENIE - a computationally efficient whole genome regression for quantitative and binary traits #ASHG20
Both BOLT and SAIGE are fantastic LMMs for quantitative and binary traits respectively, but have high memory requirements and long computational times. #ASHG20
Noticed inflation in Betas as MAF decreases in SAIGE and becomes worse with larger case/control imbalance - meaning for rare variants, SAIGE produces nonsensical results. REGENIE resolves this issue #ASHG20