Bioinformatics one-liner * Day 19
1/ create a tx2gene mapping file from ensemble gtf retaining the version number of genes and transcripts. A thread. #bioinformatics #oneliner #unix
2/ awk -F "\t" '$3 == "transcript" { print $9 }' myensembl.gtf| tr -s ";" " " | cut -d " " -f2,4| sed 's/\"//g' | awk '{print $1"."$2}' > genes.txt
3/
awk -F "\t" '$3 == "transcript" { print $9 }' myensembl.gtf| tr -s ";" " " | cut -d " " -f6,8| sed 's/\"//g' | awk '{print $1"."$2}' > transcripts.txt
paste transcripts.txt genes.txt > tx2genes.txt
That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience
3. sign up for my new book divingintogeneticsandgenomics.ck.page/cellline2comma…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ming "Tommy" Tang

Ming

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tangming2005

Sep 15
1/ Real-world data are inherently messy. Generating biological insights requires good data analysis skills. Artifacts are widespread too. How to prevent and identify artifacts due to data analysis? a thread 👇
2/
have a good understanding of your data. Do exploratory analysis (EDA).
3/ choose appropriate model to analyze the data
Read 10 tweets
Sep 12
1/ Do not give me excel files (which is impossible :)).
8 tools to deal with tsv/csv files on the command line:
visidata.org Data exploration at your fingertips

#unix #data #commandline
Read 9 tweets
Sep 12
Bioinformatics one-line Day 18
1/

delete the blank lines
```
sed /^$/d'
```
delete the last line
```
sed $d
```
sed '1d' to remove the header for all csv files

```
ls *csv | parallel 'cut -f, -d 2 | sed '1d' > {/.}.list'
```
2/ print the second line of a LARGE file and quit:

sed -n '2{p;q}'

#unix #oneliner #bioinformatics #sed
That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience
Read 4 tweets
Sep 9
1/ How I started my bioinformatics journey. A thread:
Back in my PhD, I was studying gene regulation in the context of cancer. My first paper was on CTCF functioning as an enhancer blocker at the VEGFA locus. divingintogeneticsandgenomics.rbind.io/publication/20… yeah, CTCF and VEGFA are my two favorites!
2/ my second paper was on identifying a cofactor SFMBT as a co-factor of LSD1 complex divingintogeneticsandgenomics.rbind.io/publication/20… both papers are pure biochemistry studies, and I am so proud that I did western blot, northern blot, lentivirus knock down and ChIP-qPCR etc
3/ it was around 2012. the sequencing technology was booming and a particular assay called ChIP-seq to identify global transcription factor binding sites was really popular. I naturally wanted to identify the binding sites of CTCF, LSD1, and histone modifications in the genome.
Read 15 tweets
Sep 9
1/ Bioinformatics one-liner day Day 15
get all the folders' sizes in the current folder: du -h --max-depth=1
the total size of the current directory: du -sh .
display disk space: df -h
2/
memory usage: free -mg
open `top -M` with human readable size in Mb, Gb.
install htop htop.dev for better visualization.
#unix #onliner
That's a wrap!

If you enjoyed this thread:

1. Follow me @tangming2005 for more of these
2. RT the tweet below to share this thread with your audience
Read 4 tweets
Jul 27
1/ "What's the most important factor for your success?"
2/ I have heard many answers.

The most common one I got is "luck".
3/ A little story:

Qi Lu was earning $27 a month when he was 27 years old. At 47, he was the president of Microsoft.
Read 9 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(