excluderanges is a Bioconductor data resource providing sets of problematic genomic regions (previously, "blacklisted"). It covers six organisms, 12 genome assembles, and new exclusion sets for mouse mm39 and human telomere-to-telomere (T2T) genomes. doi.org/10.1101/2022.1…
Most exclusion sets from ENCODE, UCSC Genome Browser, GitHub lack curation methods. We uniformly processed and annotated 84 exclusion sets and made them available on AnnotationHub and via the BEDbase.org API. Tutorial at dozmorovlab.github.io/excluderanges/
For hg38, @anshulkundaje recommends 'GRCh38_unified_blacklist.bed'
, also the most comprehensive and annotated in our tests. We also recommend explicitly combining exclusion sets with centromeres, telomeres, short arm gap regions.
Interestingly, although T2T genome has been fully sequenced, centromeres, telomeres, and short arms remain detected as problematic by the Blacklist software github.com/Boyle-Lab/Blac…. Low complexity and repetitive elements continue to be an issue for short read sequencing.
We also include recent CUT&RUN-specific hg38 and mm10 exclusion sets. They require more investigation as 60% of the hg38 set overlap centromeres on chr1 and chr13 but not on other chromosomes.
development under github.com/nullranges/nul… project, and with J. Chuck Harrell, Nathan Sheffield, others 🙏
Special thanks to @_StuartLee, @timtriche and others for many great suggestions on the #nullranges Slack channel. And, to the @Bioconductor team for making this data resource possible - thank you!
If you know other resources about problematic genomic regions, please, reply, or open an issue on GitHub github.com/dozmorovlab/ex…
• • •
Missing some Tweet in this thread? You can try to
force a refresh