Bioinformatics one-liner * Day 19 1/ create a tx2gene mapping file from ensemble gtf retaining the version number of genes and transcripts. A thread. #bioinformatics#oneliner#unix
1/ Real-world data are inherently messy. Generating biological insights requires good data analysis skills. Artifacts are widespread too. How to prevent and identify artifacts due to data analysis? a thread 👇
2/
have a good understanding of your data. Do exploratory analysis (EDA).
1/ Do not give me excel files (which is impossible :)).
8 tools to deal with tsv/csv files on the command line: visidata.org Data exploration at your fingertips
1/ How I started my bioinformatics journey. A thread:
Back in my PhD, I was studying gene regulation in the context of cancer. My first paper was on CTCF functioning as an enhancer blocker at the VEGFA locus. divingintogeneticsandgenomics.rbind.io/publication/20… yeah, CTCF and VEGFA are my two favorites!
2/ my second paper was on identifying a cofactor SFMBT as a co-factor of LSD1 complex divingintogeneticsandgenomics.rbind.io/publication/20… both papers are pure biochemistry studies, and I am so proud that I did western blot, northern blot, lentivirus knock down and ChIP-qPCR etc
3/ it was around 2012. the sequencing technology was booming and a particular assay called ChIP-seq to identify global transcription factor binding sites was really popular. I naturally wanted to identify the binding sites of CTCF, LSD1, and histone modifications in the genome.
1/ Bioinformatics one-liner day Day 15
get all the folders' sizes in the current folder: du -h --max-depth=1
the total size of the current directory: du -sh .
display disk space: df -h
2/
memory usage: free -mg
open `top -M` with human readable size in Mb, Gb.
install htop htop.dev for better visualization. #unix#onliner
That's a wrap!
If you enjoyed this thread:
1. Follow me @tangming2005 for more of these 2. RT the tweet below to share this thread with your audience