Tweet

Melissa Gymrek

23 Jan, 9 tweets, 4 min read

@anzheng25

1/n Super proud of work published this week by @anzheng25 et al. in @NatMachIntell using #deeplearning to identify sequence context features predictive of transcription factor binding. rdcu.be/cdMmE Some key points:

2/n The main idea: TFs typically bind short motifs of 6-12bp. But only a small fraction of motifs in the genome are actually bound. How well can the question of “to bind or not to bind” be predicted by sequence context (1kb) around the motif using #DeepSea style CNNs?

3/n Pretty well! For most TFs we tried, we could predict whether its motif was bound based on ChIP-seq very well (mean auROC ~0.94) just from local sequence context

4/n We also did a bunch of simulations to compare model interpretation methods to find out *why* we predict some sequences to be bound and others not. We looked at Grad-CAM, #DeepLIFT, saliency maps, and in silico saturation mutagenesis. Which is best?

5/n Answer: depends. In silico mutagenesis typically does best but is really slow. DeepLIFT was better at finding important regions, while Grad-CAM seemed to have better base-pair resolution to score important bases.

6/n But, how you choose training datasets for these models **really matters**. E.g. when randomly choosing “negative” (unbound) sequences, we mostly learn “pioneer factor” motifs also predictive of open chromatin, probably only indirectly related to the target TF.

7/n On the other hand, conditioning all training data to be within open chromatin regions results in learning very different features, probably more directly related to the TF such as co-binding partners. This also really reduced our prediction accuracy as expected.

8/n Thinking about applying #deeplearning to interpret #GWAS and other variants: depending on how you train the model, you can end up with very different variant-level scores. I think there’s more work to do to figure out the best way to apply these for variant interpretation.

@anzheng25

9/n On a related note, I am very proud of @anzheng25 for spearheading our lab’s first paper that is *not* about STRs 😀. (although, we’ve been thinking about how to apply these methods to them as well. Ideas welcome!)

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Melissa Gymrek

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?