Latest Twitter Threads by @tangming2005 on Thread Reader App

Jul 24 • 19 tweets • 3 min read

1/ You’re merging gene data across tools. Suddenly nothing matches.
ENSEMBL, ENTREZ, TP53, P53…
Why so many gene IDs?

2/
Gene ID chaos is real.
One gene, three names: ENSEMBL ID, ENTREZ ID, gene symbol.
Different formats, same biology—but not always.

Jul 9 • 12 tweets • 4 min read

I had a roadmap for biology -> computation. What about the reverse?

Many with a computer science background are asking: where should they start to learn biology?

1/ Text book: Molecular Biology of the Cell, amazon.com/Molecular-Biol…

Jun 14 • 13 tweets • 3 min read

1/ If you're in bioinformatics, you're staring at matrices all day.
RNA-seq? Gene x sample.
scRNA-seq? Gene x cell.
Everything is a matrix.
But I never learned how to think in matrices. And I regret it.

2/
No one told me in school:
To survive bioinformatics, you don’t just need R or Python.
You need linear algebra.
Not fancy. Just the fundamentals.

May 15 • 18 tweets • 3 min read

Your data is lying to you. Here’s how technical artifacts distort biology—and how to see the truth. 👇
1/ Beautiful t-SNE? Shiny heatmap?
Look closer.
Technical artifacts can fake whole cell types.
Here’s where the ghosts hide.

2/
Spatial RNA: slice a tissue and transcripts “bleed” into neighbors.
A fibroblast + T cell spot?
Maybe just spillover, not a franken-cell.

May 3 • 20 tweets • 3 min read

1/ Want to ruin your analysis in one move?
Ignore the biology behind the data.
Let me show you why that mistake keeps happening.

2/
Biology has context. And biology data? It has bias.
Not political bias. Technical bias. Built into how the data is generated.

Apr 17 • 18 tweets • 3 min read

🧵Bioinformatics evolves fast. New tech. New data. New analysis.

But here's how to stay grounded and not get overwhelmed:

1/

Single-cell took the spotlight.

Then came spatial transcriptomics.

Now we have spatial epigenomics and proteomics.

Apr 1 • 11 tweets • 3 min read

🧵 If you’re doing bioinformatics manually, you’re wasting time and prone to make errors.

1/ Bioinformatics is full of repetitive tasks. The best bioinformaticians don’t just analyze data—they automate. Let’s break it down. 👇

2/ There are 4 levels of bioinformatics skills, from manual work to full automation. The higher you go, the faster and more efficient you become.

Feb 9 • 12 tweets • 2 min read

🧵 Looking for a job in biotech sucks right now. With layoffs happening across biotech and pharma, here's some advice to navigate these challenges. 1/ Layoffs are happening left and right in biotech and big pharma. I’m lucky to still have a job, but many aren’t. Here’s what I’ve learned.

Feb 4 • 15 tweets • 3 min read

🧵 How to Use samtools – A Must-Know Tool for NGS Data
If you're working with sequencing data, samtools is essential. It was developed by Heng Li, who also created BWA and minimap2. Let's dive into its usage. 👇 1/ What is samtools?
samtools is a toolkit for handling SAM/BAM/CRAM files, the standard formats for storing sequence alignments. It allows you to sort, index, filter, and query alignment files efficiently.

Jan 28 • 13 tweets • 2 min read

1/ Why is bioinformatics so complicated? Because biology is. Here’s a quick example to show just how nuanced even a "simple" analysis can be. 2/ Genes aren’t simple entities. Most genes have multiple transcripts. Different transcripts can have unique TSS(transcription start site), TES,(transcription end site) and exon compositions.

Jan 21 • 12 tweets • 3 min read

Why understanding biology matters in bioinformatics
One big lesson: RNA and protein levels aren’t always correlated. If you don’t know this, you might draw the wrong conclusions. 🧵👇 1/ Why does this matter?

In bioinformatics, you often analyze RNA-seq or proteomics data. If you only rely on one, you risk missing the full picture.

For example:

mRNA and protein levels of the same gene can tell different stories due to regulation at multiple levels.

Jan 9 • 9 tweets • 2 min read

FASTQ files are fundamental in bioinformatics, but working with them efficiently requires Unix skills. Here's a handy one-liner to count read lengths in a compressed FASTQ file 👇

the command: zless example.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}'
Let's break it down:

Dec 30, 2024 • 8 tweets • 2 min read

🔥 chatomics! I regret not learning this well in college—and it changed how I approach bioinformatics today. Here’s my story and why you should avoid my mistake. 🧬 1/ My biggest regret? 💭 I wish I had learned linear algebra properly in college.
I barely passed the exam (and calculus wasn’t much better!). It felt boring and disconnected from real-world applications.
But years later, bioinformatics taught me how critical it is.

Dec 24, 2024 • 12 tweets • 4 min read

🎯 Do you really understand p-values?
The p-value histogram can reveal a LOT about your data. Let's break it down using real examples.👇 1/ First, a quick fact: P-values follow a uniform distribution under the null hypothesis.
What does that mean? 🤔
If there’s truly no difference between groups, the p-value behaves like rolling a fair die:
• P(p < 0.01) = 0.01
• P(p < 0.02) = 0.02

Dec 22, 2024 • 8 tweets • 3 min read

1/ 💡 Want to level up your bioinformatics game?
If you can master these 6 plots, you’ll be able to recreate 90% of figures in genomics papers:
• 📊 Barplot
• ⚡ Scatterplot
• 📈 Line plot
• 📉 Histogram
• 📦 Boxplot
• 🔥 Heatmap
Let’s dive into how YOU can do it. 🧬 2/ But first, a holiday story! 🎄
Boston is blanketed in snow, and our neighbor cleared it for us this morning with his snowblower. When I thanked him, he said:

"I just love helping others." 💛

As they say:

"Giving roses to others leaves fragrance on your hands."

Dec 21, 2024 • 10 tweets • 3 min read

1/ Exploratory Data Analysis (EDA) is the first step in any data analysis journey. When working with RNA-seq data, one of the most commonly used techniques is Principal Component Analysis (PCA). But what exactly is PCA, and why does it matter? Let’s break it down. 🧵👇

2/ What is PCA?

PCA is a mathematical method used to simplify complex datasets. It finds patterns by identifying directions (called principal components) that capture the most variation in the data. read my post divingintogeneticsandgenomics.com/post/pca-in-ac…

Jun 29, 2024 • 14 tweets • 4 min read

1/ 12 web tools to explore genomics data 🧵 2/ cbioportal explore genomic datasets at the tips of your fingerscbioportal.org

Jun 26, 2024 • 19 tweets • 6 min read

1/ 16 resources for re-analyzing public expression data.🧵 2/ RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)rnama.com/docs/search-ev…

Jun 25, 2024 • 13 tweets • 4 min read

1/ 10 courses to get you started with bioinformatics 🧵 2/ by Rafa Irizarry at Dana-Farber.rafalab.dfci.harvard.edu/pages/harvardx…

Sep 20, 2023 • 14 tweets • 7 min read

12 (some are free online) Books that I bought for learning (genomic) data science 👇 🧵 #python #rstats #bioinformatics 1/ You need to learn linux command first. Read it for free buff.ly/46f3FQ3

Sep 7, 2023 • 14 tweets • 4 min read

Are protein and RNA correlated? 12 papers and examples 👇 🧵 what do you think? 1/ It is gene-specific, see figure 2D from Quantitative Proteomics of the Cancer Cell Line Encyclopedia
buff.ly/3PqSvSL

Share this page!

Enter URL or ID to Unroll