1/ Do you need reference sequence files from #Ensembl? All of the different files available can be confusing. Here’s a thread to help you decide which files you need…🧵
3/ Let’s explore the different directories and files available
4/ The FTP site contains directories for all file types from the current Ensembl release, as well as directories that contain all files from previous Ensembl releases
E.g. ftp.ensembl.org/pub/release-10…
6/ The reference #FASTA files can be found in the ‘current_fasta’ directory. You’ll then need to navigate to the directory for your species of interest
7/ From here, you can find directories for DNA, cDNA, CDS, peptide or ncRNA sequences
8/ In the DNA directory, you will find files that named following this pattern:
9/ The <sequence type> indicates whether the sequence is unmasked (dna), hard-masked (dna_rm) or soft-masked (dna_sm).
10/ The <id type> tells us whether the sequence is either a single 'chromosome', 'nonchromosomal' or the 'seqlevel'.
11/ But, what’s the ‘seqlevel’? 🤔
12/ TOPLEVEL sequence files contain all sequence regions flagged as toplevel in Ensembl. This includes chromosomes, regions not assembled into chromosomes and N padded haplotype/patch regions.
13/ PRIMARY ASSEMBLY files contain all toplevel sequence regions excluding haplotypes and patches.
14/ This file is best used for performing sequence similarity searches where patch and haplotype sequences would confuse analysis. If the primary assembly file is not present, that indicates that there are no haplotype/patch regions, and the 'toplevel' file is equivalent.
15/ If you are performing alignments using a program that requires a genome FASTA, such as #HTseq, TopHat or #HISAT then the best choice for most cases is the primary assembly.
2/ The way you approach this problem will depend on if you are starting with a #gene of interest or if you already have the ID (e.g rs699) of a variant for which you want to find the observed allele frequencies.
3/ If you are starting with a gene, search for the gene name or ID from the #Ensembl homepage and navigate to the Gene tab.
Want to learn about a gene function, but there’s no functional data in your species of interest? Or maybe looking for a homologue of your fav gene in a model organism to carry out functional work? Look no further! This #tweetorial will show you how to find orthologues in @ensembl
2/14
Let’s start on the Ensembl homepage and search for our #gene of interest SCP2 by typing its name into the search box. Then go to the gene tab by clicking on the gene name in the search results.
3/14
You can learn more about the #gene function by exploring gene ontology terms and associated phenotypes. Let’s click on Phenotypes in the side menu. This view shows phenotypes associated with our gene of interest and variants in this gene.