Liz Tseng Profile picture
Jun 9 11 tweets 5 min read
#bioinformatics tip of the day: FOLLOW THE DATA. Trace a read through its journey. Example is when I look at a fresh @PacBio Iso-Seq run (but really for any sequencing data for any application).

I start w CCS reads, scroll through them to viz the primers and polyA tail. /1
@PacBio 2/ I run lima to strip the cDNA primers, check the summary file to see a good percentage actually has the expected 5'/3' cDNA primers. if a high % failed, look for reasons why - is it user error (wrong primers?), prep error, or biological?
3/ then run `isoseq3 refine` to identify (unintended) concatemers and missing polyA tails. if reads are lost here, it usually indicates serious library prep issues, less about bioinfx.

*always look at your trash reads to understand why they're trashed*
4/ now you get your full-length transcripts you map them to the genome.

common user error in read alignment: giving it the wrong genome / running aligner with wrong parameters

common interpretation error in alignment outcome: not looking at secondary or supp alignments
5/ with reads aligned, export the aligned BAM to IGV or UCSC genome browser and LOOK AT IT. scroll through gene by gene. do they line up with the annotation?

common user error: wrong genome version or wrong annotation version (hg19 v hg38, diff gencode versions)
6/ if everything checks out at this step as in *you have visually confirmed the reads were processed and mapped correctly with the right genome + annotations* --- now you can finally start thinking about writing tools to interpret the work!

until you've done so...
7/ ...don't tell me you did your due diligence in #Bioinformatics if you simply clicked a button or ran a bunch of commands and "the (final) output does not look right".

there's no way to know if data doesn't look right if you did not know what were all the intermediate steps...
8/ ...and what those intermediate outcomes are.

the whole point of #Bioinformatics troubleshooting is to know what each step is supposed to do (you do not need to know how to write code to know that) and how to separate user error from lib error from biology
9/ ...don't look at the final report figures or stats and panic. follow the sequencing data through its journey. knowing a bit of python coding helps, but you can get away with just having an arsenal of unix command line skills at your disposal. (links in next comment)
11/ get familiar with @GenomeBrowser , @igvteam , (I don't use @galaxyproject myself), and plain old NCBI BLAST. when in doubt, BLAST your sequence.

ok, coming off my soapbox & back to work

//end

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Liz Tseng

Liz Tseng Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(