Tweet

Liz Tseng

Jun 9 • 11 tweets • 5 min read

@PacBio

#bioinformatics tip of the day: FOLLOW THE DATA. Trace a read through its journey. Example is when I look at a fresh @PacBio Iso-Seq run (but really for any sequencing data for any application).

I start w CCS reads, scroll through them to viz the primers and polyA tail. /1

@PacBio

@PacBio 2/ I run lima to strip the cDNA primers, check the summary file to see a good percentage actually has the expected 5'/3' cDNA primers. if a high % failed, look for reasons why - is it user error (wrong primers?), prep error, or biological?

3/ then run `isoseq3 refine` to identify (unintended) concatemers and missing polyA tails. if reads are lost here, it usually indicates serious library prep issues, less about bioinfx.

*always look at your trash reads to understand why they're trashed*

4/ now you get your full-length transcripts you map them to the genome.

common user error in read alignment: giving it the wrong genome / running aligner with wrong parameters

common interpretation error in alignment outcome: not looking at secondary or supp alignments

5/ with reads aligned, export the aligned BAM to IGV or UCSC genome browser and LOOK AT IT. scroll through gene by gene. do they line up with the annotation?

common user error: wrong genome version or wrong annotation version (hg19 v hg38, diff gencode versions)

6/ if everything checks out at this step as in *you have visually confirmed the reads were processed and mapped correctly with the right genome + annotations* --- now you can finally start thinking about writing tools to interpret the work!

until you've done so...

7/ ...don't tell me you did your due diligence in #Bioinformatics if you simply clicked a button or ran a bunch of commands and "the (final) output does not look right".

there's no way to know if data doesn't look right if you did not know what were all the intermediate steps...

8/ ...and what those intermediate outcomes are.

the whole point of #Bioinformatics troubleshooting is to know what each step is supposed to do (you do not need to know how to write code to know that) and how to separate user error from lib error from biology

9/ ...don't look at the final report figures or stats and panic. follow the sequencing data through its journey. knowing a bit of python coding helps, but you can get away with just having an arsenal of unix command line skills at your disposal. (links in next comment)

10/ good #Bioinformatics #unix cheat sheet: …vis-bioinformatics-training.github.io/2019-Winter-Bi…

@GenomeBrowser

11/ get familiar with @GenomeBrowser , @igvteam , (I don't use @galaxyproject myself), and plain old NCBI BLAST. when in doubt, BLAST your sequence.

ok, coming off my soapbox & back to work

//end

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Liz Tseng

People who liked this thread also liked...

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?