Director of bioinformatics. Chatomics! On my way to helping 1 million people learn bioinformatics. Also talks about leadership. views are my own not my employer
Dec 24 β’ 12 tweets β’ 4 min read
π― Do you really understand p-values?
The p-value histogram can reveal a LOT about your data. Let's break it down using real examples.π
1/ First, a quick fact: P-values follow a uniform distribution under the null hypothesis.
What does that mean? π€
If thereβs truly no difference between groups, the p-value behaves like rolling a fair die:
β’ P(p < 0.01) = 0.01
β’ P(p < 0.02) = 0.02
Dec 22 β’ 8 tweets β’ 3 min read
1/ π‘ Want to level up your bioinformatics game?
If you can master these 6 plots, youβll be able to recreate 90% of figures in genomics papers:
β’ π Barplot
β’ β‘ Scatterplot
β’ π Line plot
β’ π Histogram
β’ π¦ Boxplot
β’ π₯ Heatmap
Letβs dive into how YOU can do it. π§¬
2/ But first, a holiday story! π
Boston is blanketed in snow, and our neighbor cleared it for us this morning with his snowblower. When I thanked him, he said:
"I just love helping others." π
As they say:
"Giving roses to others leaves fragrance on your hands."
Dec 21 β’ 10 tweets β’ 3 min read
1/ Exploratory Data Analysis (EDA) is the first step in any data analysis journey. When working with RNA-seq data, one of the most commonly used techniques is Principal Component Analysis (PCA). But what exactly is PCA, and why does it matter? Letβs break it down. π§΅π 2/ What is PCA?
PCA is a mathematical method used to simplify complex datasets. It finds patterns by identifying directions (called principal components) that capture the most variation in the data. read my post divingintogeneticsandgenomics.com/post/pca-in-acβ¦
Jun 29 β’ 14 tweets β’ 4 min read
1/ 12 web tools to explore genomics data π§΅
2/ cbioportal explore genomic datasets at the tips of your fingerscbioportal.org
Jun 26 β’ 19 tweets β’ 6 min read
1/ 16 resources for re-analyzing public expression data.π§΅
2/ RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)rnama.com/docs/search-evβ¦
12 (some are free online) Books that I bought for learning (genomic) data science π 𧡠#python #rstats #bioinformatics
1/ You need to learn linux command first. Read it for free buff.ly/46f3FQ3
Sep 7, 2023 β’ 14 tweets β’ 4 min read
Are protein and RNA correlated? 12 papers and examples π 𧡠what do you think?
1/ It is gene-specific, see figure 2D from Quantitative Proteomics of the Cancer Cell Line Encyclopedia buff.ly/3PqSvSL
Jul 28, 2023 β’ 12 tweets β’ 4 min read
10 tools/papers related to bulk-RNAseq deconvolution.π 𧡠#computationalbiology #RNAseq Even in the era of single-cell RNAseq, bulk-RNAseq data are still very valuable.
1. [Benchmarking of cell type deconvolution pipelines for transcriptomics data]()buff.ly/458vpp3
Jul 21, 2023 β’ 12 tweets β’ 3 min read
10 FREE #rstats books to uplevel your R skills. π π§΅
1/ R Programming for Data Science buff.ly/31g1Y36
Jun 28, 2023 β’ 18 tweets β’ 6 min read
16 databases of scRNAseq datasets. π 𧡠Reuse them! #singlecell #bioinformatics
1/ [CuratedAtlasQueryR]()Β is a query interface that allows the programmatic exploration and retrieval of the harmonized, curated and reannotated CELLxGENE single-cell human cell atlas.buff.ly/46DKAIv
Jun 26, 2023 β’ 14 tweets β’ 4 min read
Data visualization is key to any data analysis. Make sure you know your data by doing EDA.
12 resources for data visualization 𧡠π
1/ The R Graph Gallery buff.ly/2lCZxbU
Jun 22, 2023 β’ 9 tweets β’ 3 min read
Pathway or gene set enrichment analysis is frequently used in genomic studies. Make sure you understand it with these 8 resources: π π§΅
1/ Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges buff.ly/46hQNt5
May 31, 2023 β’ 18 tweets β’ 6 min read
16 resources for re-analyzing public expression data. π π§΅
1/ buff.ly/3MJfshd RNA meta Analysis has ~26,700 studies (5,717 RNA-Seq and 20,955 Microarray)
May 29, 2023 β’ 8 tweets β’ 1 min read
Want to get lucky and be successful? There are four flavors of luckπ§΅π
1/ There are four levels of luck: blind luck, luck through motion, luck favoring the prepared mind, and luck finding you through reputation.
1/ False belief: I need to learn fancy machine learning stuff or algorithms for computational biology.
Reality: most of us will only need to learn the data skills to answer biological questions.
Find the roadmap below π π§΅
2/ If you are like me, you will not need to develop a reads aligner such as STAR. You will only need to learn how to use those tools. get the reads mapped and get the counts table for DESeq2 for RNAseq. Learn Unix buff.ly/3FITwR1 and RNAseq buff.ly/3mR61mV
Mar 9, 2023 β’ 11 tweets β’ 3 min read
Public genomic data and reference data are treasures to researchers. 10 tools to get the data easily from the public repositories.π π§΅
There was little online material to learn bioinformatics 10 years ago when I started.
I curated ten resources to learn bioinformatics for FREE π§΅π
1/ Data Analysis for the Life Sciences Series buff.ly/3Z7F1ha by Rafa at DFCI. you can find the courses on Edx buff.ly/3mapP4m
Feb 23, 2023 β’ 12 tweets β’ 5 min read
Spatial transcriptome is the next wave after single-cell RNAseq. Resources to bookmark to get into the field π π§΅
1/ 8 Review papers:
* [The emerging landscape of spatial profiling technologies](buff.ly/3cwcApw)
* [The expanding vistas of spatial transcriptomics](buff.ly/3m1x9zb)
* [Exploring tissue architecture using spatial transcriptomics](buff.ly/3Sq7Z9f)
Feb 16, 2023 β’ 6 tweets β’ 3 min read
People always ask how the protein is expressed if I show the RNA data. Here are the 6 resources for protein data ππ§΅
1/ CPTAC, the biggest database for cancer proteomic.datacommons.cancer.gov/pdc/ python package to access it github.com/PayneLab/cptac