Ming
Jul 24 19 tweets 3 min read Read on X
1/ You’re merging gene data across tools. Suddenly nothing matches.
ENSEMBL, ENTREZ, TP53, P53…
Why so many gene IDs? Image
2/
Gene ID chaos is real.
One gene, three names: ENSEMBL ID, ENTREZ ID, gene symbol.
Different formats, same biology—but not always.
3/
ENTREZ ID is a stable integer from NCBI.
Example: TP53 = 7157
4/
ENSEMBL ID is versioned and specific.
TP53 = ENSG00000141510
Used in Ensembl, Gencode, and many RNA-seq pipelines.
5/
Then you have gene symbols.
TP53 is the official name.
But some still write P53 (not wrong—just outdated).
6/
Here’s the problem:
One ENSEMBL ID can link to multiple symbols.
One symbol can map to many ENSEMBL IDs.
Now you’re stuck.
7/
Why so messy?
Gene annotations evolve.
A new transcript is discovered.
A gene gets renamed.
Science moves, but your data might not.
8/
How do you fix this?
Use biomaRt in R to map IDs reliably.
It pulls live data from Ensembl. Here’s a working example:
9/
library(biomaRt)
mart <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes = c("ensembl_gene_id", "entrezgene_id", "external_gene_name"),
filters = "ensembl_gene_id",
values = "ENSG00000141510",
mart = mart)
Returns TP53, 7157, and ENSEMBL ID.
10/
Prefer the browser?
Use Ensembl BioMart: ensembl.org/biomart/martvi…
Or NCBI Gene: ncbi.nlm.nih.gov/gene
Paste your ID—get all the links.
11/
Still getting P53 instead of TP53?
That’s a gene symbol alias.
Fix it with the HGNChelper package in R.
12/
install.packages("HGNChelper")
library(HGNChelper)
checkGeneSymbols("P53")

It corrects aliases and returns TP53.
13/
Now let’s go across species.
You’re comparing human and mouse data.
Use babelgene to find orthologs.
14/
install.packages("babelgene")
library(babelgene)
orthologs("BRCA1", species = "mouse")

Returns mouse Brca1 with corresponding IDs.
15/
You can also convert ENTREZ IDs with org.Hs.eg.db:
library(org.Hs.eg.db)
mapIds(org.Hs.eg.db, keys = "1017", column = "SYMBOL",
keytype = "ENTREZID", multiVals = "first")

Maps 1017 to CDK2.
16/
Gene names are messy:
Synonyms (P53 = TP53)

Aliases

Orthologs

Outdated IDs
They break pipelines if unchecked.
17/
Key takeaways:
Gene IDs are not interchangeable

Use HGNChelper to fix aliases

Use biomaRt and babelgene to convert reliably

Always double-check across species
I hope you've found this post helpful.

Follow me for more.

Subscribe to my FREE newsletter chatomics to learn bioinformatics divingintogeneticsandgenomics.ck.page/profile x.com/433559451/stat…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ming "Tommy" Tang

Ming

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tangming2005

Jul 9
I had a roadmap for biology -> computation. What about the reverse?

Many with a computer science background are asking: where should they start to learn biology? Image
1/ Text book: Molecular Biology of the Cell, amazon.com/Molecular-Biol…
2/ Janeway's Immunobiology I bought a hard copy for the computational group in my previous job.wwnorton.com/books/97803938…
Read 12 tweets
Jun 14
1/ If you're in bioinformatics, you're staring at matrices all day.
RNA-seq? Gene x sample.
scRNA-seq? Gene x cell.
Everything is a matrix.
But I never learned how to think in matrices. And I regret it. Image
2/
No one told me in school:
To survive bioinformatics, you don’t just need R or Python.
You need linear algebra.
Not fancy. Just the fundamentals.
3/
I learned PCA from StatQuest.
He made it intuitive.
I took MIT1806, the best theory intro for me

Then I watched 3Blue1Brown's visual magic on eigenvalues:

That’s when it clicked.3blue1brown.com/lessons/eigenv…
Read 13 tweets
May 15
Your data is lying to you. Here’s how technical artifacts distort biology—and how to see the truth. 👇
1/ Beautiful t-SNE? Shiny heatmap?
Look closer.
Technical artifacts can fake whole cell types.
Here’s where the ghosts hide. Image
2/
Spatial RNA: slice a tissue and transcripts “bleed” into neighbors.
A fibroblast + T cell spot?
Maybe just spillover, not a franken-cell.
3/
Sectioning also frees RNases.
They chew fragile RNA before it’s fixed.
Low counts in some spots = degradation, not silence.
Read 18 tweets
May 3
1/ Want to ruin your analysis in one move?
Ignore the biology behind the data.
Let me show you why that mistake keeps happening. Image
2/
Biology has context. And biology data? It has bias.
Not political bias. Technical bias. Built into how the data is generated.
3/
A powerful paper just used Stereo-seq on human carotid plaques. It found immune structures tied to stroke risk.
nature.com/articles/s4416…
Read 20 tweets
Apr 17
🧵Bioinformatics evolves fast. New tech. New data. New analysis.

But here's how to stay grounded and not get overwhelmed: Image
1/

Single-cell took the spotlight.

Then came spatial transcriptomics.

Now we have spatial epigenomics and proteomics.
2/

Each data type brings unique quirks.

That means new methods, new tools, and often, steep learning curves.
Read 18 tweets
Apr 1
🧵 If you’re doing bioinformatics manually, you’re wasting time and prone to make errors.

1/ Bioinformatics is full of repetitive tasks. The best bioinformaticians don’t just analyze data—they automate. Let’s break it down. 👇 Image
2/ There are 4 levels of bioinformatics skills, from manual work to full automation. The higher you go, the faster and more efficient you become.
3/ Level 1: Manual Execution
• Running each command by hand.
• Copy-pasting file names, tweaking scripts line by line.
• Slow, error-prone, and impossible to scale.
Read 11 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(