1/ Do not give me excel files (which is impossible :)).
8 tools to deal with tsv/csv files on the command line: visidata.org Data exploration at your fingertips
1/ How I started my bioinformatics journey. A thread:
Back in my PhD, I was studying gene regulation in the context of cancer. My first paper was on CTCF functioning as an enhancer blocker at the VEGFA locus. divingintogeneticsandgenomics.rbind.io/publication/20… yeah, CTCF and VEGFA are my two favorites!
2/ my second paper was on identifying a cofactor SFMBT as a co-factor of LSD1 complex divingintogeneticsandgenomics.rbind.io/publication/20… both papers are pure biochemistry studies, and I am so proud that I did western blot, northern blot, lentivirus knock down and ChIP-qPCR etc
3/ it was around 2012. the sequencing technology was booming and a particular assay called ChIP-seq to identify global transcription factor binding sites was really popular. I naturally wanted to identify the binding sites of CTCF, LSD1, and histone modifications in the genome.
1/ Bioinformatics one-liner day Day 15
get all the folders' sizes in the current folder: du -h --max-depth=1
the total size of the current directory: du -sh .
display disk space: df -h
2/
memory usage: free -mg
open `top -M` with human readable size in Mb, Gb.
install htop htop.dev for better visualization. #unix#onliner
That's a wrap!
If you enjoyed this thread:
1. Follow me @tangming2005 for more of these 2. RT the tweet below to share this thread with your audience
The takeaway from this article is that the most popular RF implementation in Python (scikit) and R's RF default importance strategy do not give reliable feature importances
3/
when “... potential predictor variables vary in their scale of measurement or their number of categories.” (Strobl et al). Rather than figuring out whether your data set conforms to one that gets accurate results, simply use permutation importance.