New paper from @joans and me! A pan-cancer, cross-platform analysis identifies >100,000 genomic biomarkers for cancer outcomes. Plus, a website to explore the data (survival.cshl.edu) and a (controversial?) discussion of “cause” vs. “correlation” in cancer genome analysis.
We used every type of data collected by TCGA (RNASeq, CNAs, methylation, mutation, protein expression, and miRNASeq) to generate survival models for each individual gene across 10,884 cancer patients. In total, we produced more than 3,000,000 Cox models for 33 cancer types.
Within each cancer type, we identified thousands of biomarkers for favorable and dismal patient outcomes. The most common adverse biomarkers included overexpression of the mitotic kinase PLK1, methylation of the transcription factor HOXD12, and mutations in TP53.
GO term analysis revealed common gene groups among adverse and favorable biomarkers, including cell cycle genes (upregulated in deadly cancers) and developmental transcription factors (methylated in deadly cancers).
We could use these biomarkers to stratify patient outcomes in clinically-ambiguous situations, including Stage 1a breast cancer and Gleason 7 prostate cancer. In general, gene expression and DNA methylation biomarkers provided the most prognostic information.
So now here’s where it gets weird: aside from mutations in TP53, we didn’t see many cancer driver genes score as strong biomarkers in our prognostic analysis. KRAS, EGFR, RB1, PIK3CA, RB1, NF1… mutation, methylation, or altered expression of these genes wasn’t really prognostic.
In the literature, if some gene is associated with worse cancer outcomes, then that is typically presented as evidence that that gene is an important cancer driver. But clearly KRAS and PIK3CA are important cancer drivers and they didn’t score in our analysis… so what gives?
To investigate this, we analyzed lists of cancer driver genes, and then we compared their prognostic significance to randomly-permuted gene sets. Surprisingly, verified oncogenes were no more likely to be prognostic than any randomly chosen gene in the genome!
For instance - KRAS mutations clearly drive lung cancer. But KRAS mutations in lung cancer are *not* associated with worse patient outcomes. In some cases, mutations in specific oncogenes are associated with *better* outcomes, not worse outcomes.
If you infer the importance of a gene from survival analysis (which is exceptionally common in the literature, and is something I’ve previously done myself) - you could accidentally conclude that CENPA is a more important driver of prostate cancer progression than MYC:
In general, our analysis provides genome-wide evidence that inferring *causation* (gene A is a driver of cancer progression) from *correlation* (gene A is overexpressed in deadly cancers) is not appropriate for patient outcome analysis, even if it’s commonly done.
Next, we looked at cancer drug targets. Again, it is routine to see the fact that a gene is associated with deadly cancers presented as evidence that that gene is a good drug target. But is this link justified by the data?
We looked at the targets of all FDA-approved cancer drugs, and we found that these drug targets were no more likely to be prognostic than any randomly-selected gene in the genome!
Consider PD1 as a drug target. High levels of PD1 (PDCD1) are associated with patient survival. So you might think that PD1 inhibitors would kill people! But cancers don’t work like that - survival correlation is not causation - and PD1 inhibitors in fact prolong survival.
(You could imagine that this is a type of post-hoc fallacy - maybe these genes are non-prognostic because of the existing therapies. But we did a sub-analysis on drugs approved after 2017 [post-TCGA], and we observed the same pattern).
Then we asked - what happens if you target the worst adverse features in the genome? Maybe those are still the best drug targets? Among the top 50 prognostic factors in the genome, we found that 16 have been targeted in clinical trials, and 15 of them have failed.
We believe this is because the most prognostic factors are not selective oncogenes. They’re housekeeping cell cycle genes that are ubiquitously expressed, and they’re essential across cell types. No cell type-selectivity = systemic toxicity and trial failure.
Successful cancer drug targets may be adverse biomarkers, favorable biomarkers, or they may have no survival correlation whatsoever. Our data demonstrates that this type of prognostic analysis should be uncoupled from therapeutic target development.
To put this in perspective - imagine a KM plot of 10,000 senior citizens: “people receiving dialysis” vs “people not receiving dialysis”. Individuals receiving kidney dialysis are more likely to die than individuals who are not receiving dialysis...
Based strictly on this correlative observation, one could assume that kidney dialysis kills people! Yet, we know that people receiving dialysis are likely to be older and have several medical comorbidities, and dialysis saves their lives. Same thing in cancer genomics!
Inferring functional relationships and prioritizing drug targets based on correlative outcomes analysis may be inappropriate, as these relationships can be fraught with confounding variables and spurious associations.
So, let me know what you think, and take a look at our website - survival.cshl.edu. 3 million Kaplan-Meier plots to explore and lots more exciting findings to uncover. Feedback welcome!
I should add - I was playing around with some of the ideas in the paper in the thread linked below. It goes a little deeper into the drug target analysis and the misinterpretation of what survival curves mean:
In a blinded name-swap experiment, black female high school students were significantly less likely to be recommended for AP Calculus compared to other students with identical academic credentials. Important new paper from @DaniaFrancis:
Some background: one of the best ways to collect real-world evidence of discrimination is through name-swapping "audit" studies. In these experiments, people are presented with job applications, resumes, mortgage applications, etc., that are identical except for the name…
The applicant’s name is varied to suggest the individual’s race/ethnicity/gender. Think “John” vs “Juan” or “Michael” vs. “Michelle”.
Angelika Amon passed away this morning. She was the greatest scientist I’ve ever met. This is a huge loss for her family, her friends, and for every biologist.
As a grad student with Kim Nasmyth and then an independent fellow at the Whitehead, Angelika changed our understanding of the cell cycle.
People thought that cell cycle kinases just got degraded at the end of mitosis, but she showed that regulated phosphatase activity was actually crucial to completing the cell cycle and re-entering G1:
In two weeks, the Nobel Committee at the Karolinska Institute will award the 2020 Nobel Prize in Medicine/Physiology.
Who will win? We don’t know for sure - but I think that we can make some educated guesses.
Science is dominated by a phenomenon called “the Matthew effect”. In short, the rich get richer. Getting one grant makes it more likely you’ll get the next. Winning one prize makes it more likely you’ll win another.
Here are the award rates for 11 different postdoc fellowships in 2019.
There’s a huge variation in success rates: four different organizations fund fewer than 6% of applications that they receive, while the success rates for the K99 and F32 are >24%.
To back up - my appointment at CSHL let me run a lab without doing a postdoc, so I never had the experience of applying for these grants. To help out my current postdocs, I wanted to make up for my lack of experience by doing some research.
I collected the award rates for each of these grants either from the org’s website or by emailing them directly. (I included an asterisk to indicate uncertainty. For instance, Beckman said they received “over” 150 applications, and I used 150 as the denominator).
Question: can anyone name a paper whose findings were challenged by a “matters arising” or “technical comment”-type rebuttal, but subsequent research proved that the original paper was actually correct?
One example: Charles Sawyers published that leukemia patients who relapsed on Gleevec developed ABL-T315I mutations.
Science then published 2 technical comments reporting that other groups didn't find this mutation in independent patient populations:
Larger surveys subsequently confirmed that T315I was a common (though not universal) cause of Gleevec resistance, T315I became the paradigmatic example of a “gatekeeper” resistance mutation, and Sawyers won the Lasker prize.
What happens to a paper submitted to a top journal?
Among a set of manuscripts sent out for review by Cell in 2018:
-33% were published in Cell
-26% were published in another Cell-family journal
-7% are still under review at Cell
-The median time to publication was 391 days
To back up: in 2018, Cell started the “Sneak Peek” program, in which authors had the option of posting a preprint of their manuscript if it was sent out for review by a Cell-family journal. cell.com/sneakpeek
Using this site, I found 46 papers that were sent out for review at Cell and posted on “Sneak Peek” between June 1st and Dec 31st, 2018. Each paper’s current status was also noted: “Published”, “Under review”, or “Review Complete” (a nice euphemism for “rejected”).