Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Ming "Tommy" Tang

@tangming2005

May 28 • 21 tweets • 3 min read • Read on X

Scrolly

1/ You can't bolt AI onto chaos.
In biotech, if your data is a mess, your AI won't save you.
Build the data strategy first. Here's how.

2/
Real-world data isn't AI-ready.
Without structure, governance, and clarity, it’s noise.
AI needs fuel. And that fuel is clean data.

3/
At a biotech startup, we learned this the hard way.
Here’s what I took from a panel and years of practice.
The essentials:
Governance

Management

Metadata

Team dynamics

Tool choices

4/
Start with data governance.
Access control.
Versioning.
Basic security.
Do it early. Fixing leaks later costs 10x more.

5/
Cloud is great—but only if you use it right.
Define who sees what.
Set folder rules.
Use Google or AWS security playbooks. They’re free and solid.

6/
Next up: Data management.
Chaos begins with "just toss it in the drive."
Don’t.
Structure folders. Standardize metadata.
Make it findable again.

7/
Spreadsheets are fine—until they aren’t.
Start smart:
“Female” not “F”
No weird characters
Train your wet lab team. Seriously.

8/
You’ll accrue technical debt. That’s fine.
If someone curses your naming scheme 5 years from now, congrats.
You survived.

9/
But please—do the basics right.
Read this paper. Print it. Frame it.
“Data Organization in Spreadsheets”
tandfonline.com/doi/full/10.10…

10/
Public data is cheap.
In-house data is gold.
Use a LIMS to track it.
Know where each sample came from.
Know what each file means.

11/
Aim for FAIR:
Findable
Accessible
Interoperable
Reusable
Even doing 70% right will put you ahead.

12/
Keep it simple:
Internal/
├── RNAseq/
├── WGS/
Public/
├── TCGA/
├── ENCODE/
Each with a README. Just say what the data is and where it came from.

13/
README template:
When was this data generated?

What experiment?

Where’s the preprocessing code?

Who should I ask?

14/
You’ll get pressure to move fast.
Investors want plots, not pipelines.
But for big projects—do it right.
Rushed analysis rots from the inside.

15/
Custom tools give you power.
Commercial tools give you speed.
Pick based on your team’s skill—not vendor marketing decks.

16/
And finally—people.
Your computational and wet lab teams must sit together.
Talk daily. Argue weekly. Trust always.

17/
Example:
Bioinformaticians prep the Seurat object.
Wet lab explores it in Shiny.
This builds insight AND independence.

18/
Good data strategy isn't sexy.
But it's the foundation.
It makes your R&D faster, your AI smarter, and your team happier.

19/
Startups die by disorganized data.
Don’t be one of them.
Fix your foundation now—before the chaos scales.

20/
Have you seen data disasters in biotech?
How did you fix it—or not?
Reply and let’s trade war stories.

x.com/433559451/stat…

I hope you've found this post helpful.

Follow me for more.

Subscribe to my FREE newsletter chatomics to learn bioinformatics divingintogeneticsandgenomics.ck.page/profile x.com/433559451/stat…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @tangming2005

Ming "Tommy" Tang

@tangming2005

May 21

1/ AI won’t save sloppy science.
Before you dive into deep learning, master your foundations.
Here’s why basic bioinformatics still rules 🧵

2/
AI is flashy. But the core skills—UNIX, plotting, EDA—are what let you trust your data.
Without them? You’re flying blind.

3/
UNIX isn’t sexy.
But it’ll save your life when you’ve got 100 samples and need to rename, reformat, or reprocess them—fast.

Read 13 tweets

Ming "Tommy" Tang

@tangming2005

May 20

Anthropic just published "the single most important workflow for using Claude Code." It is four steps: Explore, Plan, Code, Commit.

Every bioinformatician I know who is good at their job has been doing this for years. Just without the AI part. Here is why it maps so cleanly.

Explore.

For Claude Code: read the relevant files before touching anything. Understand what exists. Map the dependencies.

For bioinformatics: look at the data before you analyze it. Plot the distributions. Check the metadata. Count the NAs. Ask the wet-lab person what they actually did. Read the existing pipeline.

Read 16 tweets

Ming "Tommy" Tang

@tangming2005

May 12

One of the best Claude Code feature is auto mode.

It is a classifier that decides which permission prompts you actually need to see. Safe reads and routine commands run without interrupting you. Anything that looks risky still gets blocked and surfaced for approval.

If you have ever felt like Claude Code is asking you to approve `ls` for the hundredth time today, this is for you.

Before auto mode there were two bad choices.

Approve everything one prompt at a time and spend half your day clicking yes. Or run with `--dangerously-skip-permissions` and pray the model never decides to be creative about which directory to delete from.

Read 12 tweets

Ming "Tommy" Tang

@tangming2005

May 11

Claude Code's /ultraplan is one of the AI feature in a while that actually changed my workflow instead of just speeding it up.

btw, I always use /plan for a new task. /ultraplan is different.

You ask for a plan from your CLI. It gets drafted in the cloud.

You keep coding. A few minutes later you tab over to your browser and the plan is sitting there, and you can highlight any sentence and leave a note on it.

That's it. That's the whole pitch. And it's better than it sounds.

I had not realized how bad the chat interface is for planning until I stopped using it.

Read 11 tweets

Ming "Tommy" Tang

@tangming2005

May 10

Claude Code shipped /ultrareview and almost nobody is talking about what's actually new about it.

It's not "AI reviews your code." We had that.

It's a fleet of reviewer agents that run in the cloud, find bugs in parallel, and then independently reproduce and verify every finding before showing it to you.

Verification is the part everyone is missing.

Single-agent code review has a known weakness: the model decides what to focus on, and you get whatever it noticed.

If it spent its attention budget on naming, you don't hear about the security bug.

Read 15 tweets