Ming
Director of bioinformatics. Chatomics! On my way to helping 1 million people learn bioinformatics. Also talks about leadership. views are my own not my employer
Jan 21 12 tweets 3 min read
Why understanding biology matters in bioinformatics
One big lesson: RNA and protein levels aren’t always correlated. If you don’t know this, you might draw the wrong conclusions. 🧵👇 1/ Why does this matter?

In bioinformatics, you often analyze RNA-seq or proteomics data. If you only rely on one, you risk missing the full picture.

For example:

mRNA and protein levels of the same gene can tell different stories due to regulation at multiple levels.
Jan 9 9 tweets 2 min read
FASTQ files are fundamental in bioinformatics, but working with them efficiently requires Unix skills. Here's a handy one-liner to count read lengths in a compressed FASTQ file 👇 Image the command: zless example.fastq.gz | awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}'
Let's break it down:
Dec 30, 2024 8 tweets 2 min read
🔥 chatomics! I regret not learning this well in college—and it changed how I approach bioinformatics today. Here’s my story and why you should avoid my mistake. 🧬 1/ My biggest regret? 💭 I wish I had learned linear algebra properly in college.
I barely passed the exam (and calculus wasn’t much better!). It felt boring and disconnected from real-world applications.
But years later, bioinformatics taught me how critical it is.
Dec 24, 2024 12 tweets 4 min read
🎯 Do you really understand p-values?
The p-value histogram can reveal a LOT about your data. Let's break it down using real examples.👇 1/ First, a quick fact: P-values follow a uniform distribution under the null hypothesis.
What does that mean? 🤔
If there’s truly no difference between groups, the p-value behaves like rolling a fair die:
• P(p < 0.01) = 0.01
• P(p < 0.02) = 0.02
Dec 22, 2024 8 tweets 3 min read
1/ 💡 Want to level up your bioinformatics game?
If you can master these 6 plots, you’ll be able to recreate 90% of figures in genomics papers:
• 📊 Barplot
• ⚡ Scatterplot
• 📈 Line plot
• 📉 Histogram
• 📦 Boxplot
• 🔥 Heatmap
Let’s dive into how YOU can do it. 🧬 2/ But first, a holiday story! 🎄
Boston is blanketed in snow, and our neighbor cleared it for us this morning with his snowblower. When I thanked him, he said:

"I just love helping others." 💛

As they say:

"Giving roses to others leaves fragrance on your hands."
Dec 21, 2024 10 tweets 3 min read
1/ Exploratory Data Analysis (EDA) is the first step in any data analysis journey. When working with RNA-seq data, one of the most commonly used techniques is Principal Component Analysis (PCA). But what exactly is PCA, and why does it matter? Let’s break it down. 🧵👇 Image 2/ What is PCA?

PCA is a mathematical method used to simplify complex datasets. It finds patterns by identifying directions (called principal components) that capture the most variation in the data. read my post divingintogeneticsandgenomics.com/post/pca-in-ac…
Jun 29, 2024 14 tweets 4 min read
1/ 12 web tools to explore genomics data 🧵 2/ cbioportal explore genomic datasets at the tips of your fingerscbioportal.org
Jun 26, 2024 19 tweets 6 min read
1/ 16 resources for re-analyzing public expression data.🧵 2/ RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)rnama.com/docs/search-ev…
Jun 25, 2024 13 tweets 4 min read
1/ 10 courses to get you started with bioinformatics 🧵 2/ by Rafa Irizarry at Dana-Farber.rafalab.dfci.harvard.edu/pages/harvardx…
Sep 20, 2023 14 tweets 7 min read
12 (some are free online) Books that I bought for learning (genomic) data science 👇 🧵 #python #rstats #bioinformatics 1/ You need to learn linux command first. Read it for free buff.ly/46f3FQ3
Image
Sep 7, 2023 14 tweets 4 min read
Are protein and RNA correlated? 12 papers and examples 👇 🧵 what do you think? 1/ It is gene-specific, see figure 2D from Quantitative Proteomics of the Cancer Cell Line Encyclopedia
buff.ly/3PqSvSL
Jul 28, 2023 12 tweets 4 min read
10 tools/papers related to bulk-RNAseq deconvolution.👇 🧵 #computationalbiology #RNAseq Even in the era of single-cell RNAseq, bulk-RNAseq data are still very valuable. 1. [Benchmarking of cell type deconvolution pipelines for transcriptomics data]()buff.ly/458vpp3
Jul 21, 2023 12 tweets 3 min read
10 FREE #rstats books to uplevel your R skills. 👇 🧵 1/ R Programming for Data Science buff.ly/31g1Y36
Jun 28, 2023 18 tweets 6 min read
16 databases of scRNAseq datasets. 👇 🧵 Reuse them! #singlecell #bioinformatics 1/ [CuratedAtlasQueryR]() is a query interface that allows the programmatic exploration and retrieval of the harmonized, curated and reannotated CELLxGENE single-cell human cell atlas.buff.ly/46DKAIv
Jun 26, 2023 14 tweets 4 min read
Data visualization is key to any data analysis. Make sure you know your data by doing EDA.
12 resources for data visualization 🧵 👇 1/ The R Graph Gallery buff.ly/2lCZxbU
Jun 22, 2023 9 tweets 3 min read
Pathway or gene set enrichment analysis is frequently used in genomic studies. Make sure you understand it with these 8 resources: 👇 🧵 1/ Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges
buff.ly/46hQNt5
May 31, 2023 18 tweets 6 min read
16 resources for re-analyzing public expression data. 👇 🧵 1/ buff.ly/3MJfshd RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)
May 29, 2023 8 tweets 1 min read
Want to get lucky and be successful? There are four flavors of luck🧵👇 1/ There are four levels of luck: blind luck, luck through motion, luck favoring the prepared mind, and luck finding you through reputation.
Mar 21, 2023 8 tweets 2 min read
Sequencing data ====> Fancy figures in the paper.

The missing pieces? 👇 🧵

#rstats #computationalbiology 1/ Quality control of the data. Any technical bias? sequencing depth difference?
Mar 14, 2023 7 tweets 3 min read
1/ False belief: I need to learn fancy machine learning stuff or algorithms for computational biology.

Reality: most of us will only need to learn the data skills to answer biological questions.

Find the roadmap below 👇 🧵 2/ If you are like me, you will not need to develop a reads aligner such as STAR. You will only need to learn how to use those tools. get the reads mapped and get the counts table for DESeq2 for RNAseq. Learn Unix buff.ly/3FITwR1 and RNAseq buff.ly/3mR61mV
Mar 9, 2023 11 tweets 3 min read
Public genomic data and reference data are treasures to researchers. 10 tools to get the data easily from the public repositories.👇 🧵

1/ Fastq-dump buff.ly/41YKQiB 2/ fasterq-dump: a faster fastq-dump buff.ly/3Jqhe6A