1/ Do not give me excel files (which is impossible :)).
8 tools to deal with tsv/csv files on the command line:
visidata.org Data exploration at your fingertips

#unix #data #commandline
4/ GNU datamash gnu.org/software/datam…
6/ xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files. Commands should be simple, fast and composable: github.com/BurntSushi/xsv
7/ eBay's TSV Utilities opensource.ebay.com/tsv-utils/
That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience
3. sign up for my new book cell line to command line divingintogeneticsandgenomics.ck.page

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ming "Tommy" Tang

Ming

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tangming2005

Sep 12
Bioinformatics one-line Day 18
1/

delete the blank lines
```
sed /^$/d'
```
delete the last line
```
sed $d
```
sed '1d' to remove the header for all csv files

```
ls *csv | parallel 'cut -f, -d 2 | sed '1d' > {/.}.list'
```
2/ print the second line of a LARGE file and quit:

sed -n '2{p;q}'

#unix #oneliner #bioinformatics #sed
That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience
Read 4 tweets
Sep 9
1/ How I started my bioinformatics journey. A thread:
Back in my PhD, I was studying gene regulation in the context of cancer. My first paper was on CTCF functioning as an enhancer blocker at the VEGFA locus. divingintogeneticsandgenomics.rbind.io/publication/20… yeah, CTCF and VEGFA are my two favorites!
2/ my second paper was on identifying a cofactor SFMBT as a co-factor of LSD1 complex divingintogeneticsandgenomics.rbind.io/publication/20… both papers are pure biochemistry studies, and I am so proud that I did western blot, northern blot, lentivirus knock down and ChIP-qPCR etc
3/ it was around 2012. the sequencing technology was booming and a particular assay called ChIP-seq to identify global transcription factor binding sites was really popular. I naturally wanted to identify the binding sites of CTCF, LSD1, and histone modifications in the genome.
Read 15 tweets
Sep 9
1/ Bioinformatics one-liner day Day 15
get all the folders' sizes in the current folder: du -h --max-depth=1
the total size of the current directory: du -sh .
display disk space: df -h
2/
memory usage: free -mg
open `top -M` with human readable size in Mb, Gb.
install htop htop.dev for better visualization.
#unix #onliner
That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience
Read 4 tweets
Jul 27
1/ "What's the most important factor for your success?"
2/ I have heard many answers.

The most common one I got is "luck".
3/ A little story:

Qi Lu was earning $27 a month when he was 27 years old. At 47, he was the president of Microsoft.
Read 9 tweets
Jul 26
1/ Using random forest to calculate feature importance?
The importance score might be biased. #machinelearning #featureimportance Thanks @Matthew_N_B for pointing it out
A thread 👇
2/ explained.ai/rf-importance/

The takeaway from this article is that the most popular RF implementation in Python (scikit) and R's RF default importance strategy do not give reliable feature importances
3/
when “... potential predictor variables vary in their scale of measurement or their number of categories.” (Strobl et al). Rather than figuring out whether your data set conforms to one that gets accurate results, simply use permutation importance.
Read 4 tweets
Feb 25
1/ collecting scRNAseq data in the context of immunotherapy. I will share what I know here. welcome to contribute. nature.com/articles/s4159…
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(