Ming
May 28 21 tweets 3 min read Read on X
1/ You can't bolt AI onto chaos.
In biotech, if your data is a mess, your AI won't save you.
Build the data strategy first. Here's how. Image
2/
Real-world data isn't AI-ready.
Without structure, governance, and clarity, it’s noise.
AI needs fuel. And that fuel is clean data.
3/
At a biotech startup, we learned this the hard way.
Here’s what I took from a panel and years of practice.
The essentials:
Governance

Management

Metadata

Team dynamics

Tool choices
4/
Start with data governance.
Access control.
Versioning.
Basic security.
Do it early. Fixing leaks later costs 10x more.
5/
Cloud is great—but only if you use it right.
Define who sees what.
Set folder rules.
Use Google or AWS security playbooks. They’re free and solid.
6/
Next up: Data management.
Chaos begins with "just toss it in the drive."
Don’t.
Structure folders. Standardize metadata.
Make it findable again.
7/
Spreadsheets are fine—until they aren’t.
Start smart:
“Female” not “F”
No weird characters
Train your wet lab team. Seriously.
8/
You’ll accrue technical debt. That’s fine.
If someone curses your naming scheme 5 years from now, congrats.
You survived.
9/
But please—do the basics right.
Read this paper. Print it. Frame it.
“Data Organization in Spreadsheets”
tandfonline.com/doi/full/10.10…
10/
Public data is cheap.
In-house data is gold.
Use a LIMS to track it.
Know where each sample came from.
Know what each file means.
11/
Aim for FAIR:
Findable
Accessible
Interoperable
Reusable
Even doing 70% right will put you ahead.
12/
Keep it simple:
Internal/
├── RNAseq/
├── WGS/
Public/
├── TCGA/
├── ENCODE/
Each with a README. Just say what the data is and where it came from.
13/
README template:
When was this data generated?

What experiment?

Where’s the preprocessing code?

Who should I ask?
14/
You’ll get pressure to move fast.
Investors want plots, not pipelines.
But for big projects—do it right.
Rushed analysis rots from the inside.
15/
Custom tools give you power.
Commercial tools give you speed.
Pick based on your team’s skill—not vendor marketing decks.
16/
And finally—people.
Your computational and wet lab teams must sit together.
Talk daily. Argue weekly. Trust always.
17/
Example:
Bioinformaticians prep the Seurat object.
Wet lab explores it in Shiny.
This builds insight AND independence.
18/
Good data strategy isn't sexy.
But it's the foundation.
It makes your R&D faster, your AI smarter, and your team happier.
19/
Startups die by disorganized data.
Don’t be one of them.
Fix your foundation now—before the chaos scales.
20/
Have you seen data disasters in biotech?
How did you fix it—or not?
Reply and let’s trade war stories.
I hope you've found this post helpful.

Follow me for more.

Subscribe to my FREE newsletter chatomics to learn bioinformatics divingintogeneticsandgenomics.ck.page/profile x.com/433559451/stat…

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ming "Tommy" Tang

Ming

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @tangming2005

May 21
1/ AI won’t save sloppy science.
Before you dive into deep learning, master your foundations.
Here’s why basic bioinformatics still rules 🧵 Image
2/
AI is flashy. But the core skills—UNIX, plotting, EDA—are what let you trust your data.
Without them? You’re flying blind.
3/
UNIX isn’t sexy.
But it’ll save your life when you’ve got 100 samples and need to rename, reformat, or reprocess them—fast.
Read 13 tweets
May 20
Anthropic just published "the single most important workflow for using Claude Code." It is four steps: Explore, Plan, Code, Commit.

Every bioinformatician I know who is good at their job has been doing this for years. Just without the AI part. Here is why it maps so cleanly.
Explore.

For Claude Code: read the relevant files before touching anything. Understand what exists. Map the dependencies.
For bioinformatics: look at the data before you analyze it. Plot the distributions. Check the metadata. Count the NAs. Ask the wet-lab person what they actually did. Read the existing pipeline.
Read 16 tweets
May 12
One of the best Claude Code feature is auto mode.

It is a classifier that decides which permission prompts you actually need to see. Safe reads and routine commands run without interrupting you. Anything that looks risky still gets blocked and surfaced for approval.
If you have ever felt like Claude Code is asking you to approve `ls` for the hundredth time today, this is for you.
Before auto mode there were two bad choices.

Approve everything one prompt at a time and spend half your day clicking yes. Or run with `--dangerously-skip-permissions` and pray the model never decides to be creative about which directory to delete from.
Read 12 tweets
May 11
Claude Code's /ultraplan is one of the AI feature in a while that actually changed my workflow instead of just speeding it up.

btw, I always use /plan for a new task. /ultraplan is different. Image
You ask for a plan from your CLI. It gets drafted in the cloud.

You keep coding. A few minutes later you tab over to your browser and the plan is sitting there, and you can highlight any sentence and leave a note on it.
That's it. That's the whole pitch. And it's better than it sounds.

I had not realized how bad the chat interface is for planning until I stopped using it.
Read 11 tweets
May 10
Claude Code shipped /ultrareview and almost nobody is talking about what's actually new about it.

It's not "AI reviews your code." We had that.
It's a fleet of reviewer agents that run in the cloud, find bugs in parallel, and then independently reproduce and verify every finding before showing it to you.

Verification is the part everyone is missing.
Single-agent code review has a known weakness: the model decides what to focus on, and you get whatever it noticed.

If it spent its attention budget on naming, you don't hear about the security bug.
Read 15 tweets
May 6
7 FREE Books to learn data science 🧵 👇

1. Data science: A first introduction datasciencebook.caImage
2. Introduction to Data Science by the almighty Rafa!rafalab.dfci.harvard.edu/dsbook/
3. Agile Data Science with R edwinth.github.io/ADSwR/index.ht…
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(