Tweet

Ensembl

Jun 23 • 16 tweets • 6 min read

1/ Do you need reference sequence files from #Ensembl? All of the different files available can be confusing. Here’s a thread to help you decide which files you need…🧵

#genomics #bioinformatics #tweetorial #Ensembltraining🧬

2/ Whole-genome reference files for each species in Ensembl can be found on the FTP site 🧑‍🤝‍🧑🐭🐄🐶🐟

👉 ftp.ensembl.org

If you’re studying non-vertebrate species, you’ll need to use the Ensembl Genomes FTP site 🌾🦠🦟

👉 ftp.ensemblgenomes.org

3/ Let’s explore the different directories and files available

4/ The FTP site contains directories for all file types from the current Ensembl release, as well as directories that contain all files from previous Ensembl releases
E.g. ftp.ensembl.org/pub/release-10…

5/ There is also a folder for the human GRCh37 #genome assembly and related #data files:
👉 ftp.ensembl.org/pub/grch37/

6/ The reference #FASTA files can be found in the ‘current_fasta’ directory. You’ll then need to navigate to the directory for your species of interest

7/ From here, you can find directories for DNA, cDNA, CDS, peptide or ncRNA sequences

8/ In the DNA directory, you will find files that named following this pattern:

<species>.<assembly>.<sequence type>.<id type>.<id>.fa.gz

to indicate the contents of the file.

9/ The <sequence type> indicates whether the sequence is unmasked (dna), hard-masked (dna_rm) or soft-masked (dna_sm).

10/ The <id type> tells us whether the sequence is either a single 'chromosome', 'nonchromosomal' or the 'seqlevel'.

11/ But, what’s the ‘seqlevel’? 🤔

12/ TOPLEVEL sequence files contain all sequence regions flagged as toplevel in Ensembl. This includes chromosomes, regions not assembled into chromosomes and N padded haplotype/patch regions.

13/ PRIMARY ASSEMBLY files contain all toplevel sequence regions excluding haplotypes and patches.

14/ This file is best used for performing sequence similarity searches where patch and haplotype sequences would confuse analysis. If the primary assembly file is not present, that indicates that there are no haplotype/patch regions, and the 'toplevel' file is equivalent.

15/ If you are performing alignments using a program that requires a genome FASTA, such as #HTseq, TopHat or #HISAT then the best choice for most cases is the primary assembly.

16/ You can find more information in the README:
👉ftp.ensembl.org/pub/current_fa…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @ensembl

Ensembl

@ensembl

Jan 20

@ensembl

1/ Knowing the frequency for alleles of genomic variants in populations around the world helps us understand phenotypes and disease 🌎🌍🌏

We’re here to take you through the data in @ensembl step-by-step. A thread…🧵

#genomics #bioinformatics #tweetorial #Ensembltraining🧬

2/ The way you approach this problem will depend on if you are starting with a #gene of interest or if you already have the ID (e.g rs699) of a variant for which you want to find the observed allele frequencies.

3/ If you are starting with a gene, search for the gene name or ID from the #Ensembl homepage and navigate to the Gene tab.

Read 15 tweets

Ensembl

@ensembl

Jan 13

@ensembl

Want to learn about a gene function, but there’s no functional data in your species of interest? Or maybe looking for a homologue of your fav gene in a model organism to carry out functional work? Look no further! This #tweetorial will show you how to find orthologues in @ensembl

2/14
Let’s start on the Ensembl homepage and search for our #gene of interest SCP2 by typing its name into the search box. Then go to the gene tab by clicking on the gene name in the search results.

#Ensembltraining #genomics #bioinformatics #EnsemblCompara

3/14
You can learn more about the #gene function by exploring gene ontology terms and associated phenotypes. Let’s click on Phenotypes in the side menu. This view shows phenotypes associated with our gene of interest and variants in this gene.

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Ensembl

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @ensembl

Ensembl

Ensembl

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?