Picard's MarkDuplicates or just check the duplication stats from STAR. High duplication (>60%) means you probably sequenced too little input material.
Your "20 million reads" might actually be 5 million unique reads. That changes everything for statistical power.
8/9 Check 7: Count distribution and filtering.
After quantification, look at the distribution of counts per gene. Filter low-count genes (I typically require >10 counts in at least n samples where n = your smallest group size).
Also check for genes driving >5% of total counts. One mitochondrial gene eating half your library is more common than you think.
9/9 I run these 7 checks on every single dataset. No exceptions.
It takes about 30 minutes for a typical experiment. I've written Snakemake pipelines that automate most of it.
The alternative is spending two weeks on a differential expression analysis, presenting results, and having someone ask "did you check for batch effects?" while you stare at the floor.
Anthropic just dropped a 33-page guide on building skills for Claude.
I read the whole thing. Here's what actually matters:
1/ First, what's a skill?
A folder with a SKILL.md file. That's it. You write instructions in markdown, and Claude follows them every time instead of you re-explaining your workflow in every conversation.
Think of it as persistent memory for how you want things done.
Single biggest improvement I made to my CLAUDE.md:
"When I report a bug, don't start by trying to fix it. Instead, start by writing a test that reproduces the bug. Then, have subagents try to fix the bug and prove it with a passing test."
1/ Most developers (and AI agents) see a bug and immediately start hacking at the code.
That's backwards.
You're guessing at the fix before you even understand the failure.
2/ Here's what happens when you let Claude Code jump straight to fixing:
- It changes 3 files
- The bug looks fixed
- You ship it
- A week later the same bug is back, slightly different