Ming
Director of bioinformatics. YouTube at chatomics. On my way to helping 1 million people learn bioinformatics. Also talks about leadership. Views are my own.
Feb 9 12 tweets 2 min read
🧵 Looking for a job in biotech sucks right now. With layoffs happening across biotech and pharma, here's some advice to navigate these challenges. 1/ Layoffs are happening left and right in biotech and big pharma. I’m lucky to still have a job, but many aren’t. Here’s what I’ve learned.
Feb 4 15 tweets 3 min read
🧵 How to Use samtools – A Must-Know Tool for NGS Data
If you're working with sequencing data, samtools is essential. It was developed by Heng Li, who also created BWA and minimap2. Let's dive into its usage. 👇 1/ What is samtools?
samtools is a toolkit for handling SAM/BAM/CRAM files, the standard formats for storing sequence alignments. It allows you to sort, index, filter, and query alignment files efficiently.
Jan 28 13 tweets 2 min read
1/ Why is bioinformatics so complicated? Because biology is. Here’s a quick example to show just how nuanced even a "simple" analysis can be. 2/ Genes aren’t simple entities. Most genes have multiple transcripts. Different transcripts can have unique TSS(transcription start site), TES,(transcription end site) and exon compositions.
Jan 21 12 tweets 3 min read
Why understanding biology matters in bioinformatics
One big lesson: RNA and protein levels aren’t always correlated. If you don’t know this, you might draw the wrong conclusions. 🧵👇 1/ Why does this matter?

In bioinformatics, you often analyze RNA-seq or proteomics data. If you only rely on one, you risk missing the full picture.

For example:

mRNA and protein levels of the same gene can tell different stories due to regulation at multiple levels.
Jan 9 9 tweets 2 min read
FASTQ files are fundamental in bioinformatics, but working with them efficiently requires Unix skills. Here's a handy one-liner to count read lengths in a compressed FASTQ file 👇 Image the command: zless example.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}'
Let's break it down:
Dec 30, 2024 8 tweets 2 min read
🔥 chatomics! I regret not learning this well in college—and it changed how I approach bioinformatics today. Here’s my story and why you should avoid my mistake. 🧬 1/ My biggest regret? 💭 I wish I had learned linear algebra properly in college.
I barely passed the exam (and calculus wasn’t much better!). It felt boring and disconnected from real-world applications.
But years later, bioinformatics taught me how critical it is.
Dec 24, 2024 12 tweets 4 min read
🎯 Do you really understand p-values?
The p-value histogram can reveal a LOT about your data. Let's break it down using real examples.👇 1/ First, a quick fact: P-values follow a uniform distribution under the null hypothesis.
What does that mean? 🤔
If there’s truly no difference between groups, the p-value behaves like rolling a fair die:
• P(p < 0.01) = 0.01
• P(p < 0.02) = 0.02
Dec 22, 2024 8 tweets 3 min read
1/ 💡 Want to level up your bioinformatics game?
If you can master these 6 plots, you’ll be able to recreate 90% of figures in genomics papers:
• 📊 Barplot
• ⚡ Scatterplot
• 📈 Line plot
• 📉 Histogram
• 📦 Boxplot
• 🔥 Heatmap
Let’s dive into how YOU can do it. 🧬 2/ But first, a holiday story! 🎄
Boston is blanketed in snow, and our neighbor cleared it for us this morning with his snowblower. When I thanked him, he said:

"I just love helping others." 💛

As they say:

"Giving roses to others leaves fragrance on your hands."
Dec 21, 2024 10 tweets 3 min read
1/ Exploratory Data Analysis (EDA) is the first step in any data analysis journey. When working with RNA-seq data, one of the most commonly used techniques is Principal Component Analysis (PCA). But what exactly is PCA, and why does it matter? Let’s break it down. 🧵👇 Image 2/ What is PCA?

PCA is a mathematical method used to simplify complex datasets. It finds patterns by identifying directions (called principal components) that capture the most variation in the data. read my post divingintogeneticsandgenomics.com/post/pca-in-ac…
Jun 29, 2024 14 tweets 4 min read
1/ 12 web tools to explore genomics data 🧵 2/ cbioportal explore genomic datasets at the tips of your fingerscbioportal.org
Jun 26, 2024 19 tweets 6 min read
1/ 16 resources for re-analyzing public expression data.🧵 2/ RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)rnama.com/docs/search-ev…
Jun 25, 2024 13 tweets 4 min read
1/ 10 courses to get you started with bioinformatics 🧵 2/ by Rafa Irizarry at Dana-Farber.rafalab.dfci.harvard.edu/pages/harvardx…
Sep 20, 2023 14 tweets 7 min read
12 (some are free online) Books that I bought for learning (genomic) data science 👇 🧵 #python #rstats #bioinformatics 1/ You need to learn linux command first. Read it for free buff.ly/46f3FQ3
Image
Sep 7, 2023 14 tweets 4 min read
Are protein and RNA correlated? 12 papers and examples 👇 🧵 what do you think? 1/ It is gene-specific, see figure 2D from Quantitative Proteomics of the Cancer Cell Line Encyclopedia
buff.ly/3PqSvSL
Jul 28, 2023 12 tweets 4 min read
10 tools/papers related to bulk-RNAseq deconvolution.👇 🧵 #computationalbiology #RNAseq Even in the era of single-cell RNAseq, bulk-RNAseq data are still very valuable. 1. [Benchmarking of cell type deconvolution pipelines for transcriptomics data]()buff.ly/458vpp3
Jul 21, 2023 12 tweets 3 min read
10 FREE #rstats books to uplevel your R skills. 👇 🧵 1/ R Programming for Data Science buff.ly/31g1Y36
Jun 28, 2023 18 tweets 6 min read
16 databases of scRNAseq datasets. 👇 🧵 Reuse them! #singlecell #bioinformatics 1/ [CuratedAtlasQueryR]() is a query interface that allows the programmatic exploration and retrieval of the harmonized, curated and reannotated CELLxGENE single-cell human cell atlas.buff.ly/46DKAIv
Jun 26, 2023 14 tweets 4 min read
Data visualization is key to any data analysis. Make sure you know your data by doing EDA.
12 resources for data visualization 🧵 👇 1/ The R Graph Gallery buff.ly/2lCZxbU
Jun 22, 2023 9 tweets 3 min read
Pathway or gene set enrichment analysis is frequently used in genomic studies. Make sure you understand it with these 8 resources: 👇 🧵 1/ Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges
buff.ly/46hQNt5
May 31, 2023 18 tweets 6 min read
16 resources for re-analyzing public expression data. 👇 🧵 1/ buff.ly/3MJfshd RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)
May 29, 2023 8 tweets 1 min read
Want to get lucky and be successful? There are four flavors of luck🧵👇 1/ There are four levels of luck: blind luck, luck through motion, luck favoring the prepared mind, and luck finding you through reputation.