Read on Twitter

12,399 views

Suthirth

@Sutzpah

, 25 tweets, 8 min read Read on Twitter

@NatureMedicine

@NatureMedicine

1/n. Recently there’s been a lot of hype surrounding the @NatureMedicine @GoogleAI paper on E2E lung cancer screening with 3D deep learning on LDCTs. Is the hype warranted? Thread 👇

@GoogleAI

@GoogleAI

2/n. @GoogleAI reported building a model which takes in an entire low dose chest-CT as input and puts out two ROIs and an overall lung cancer malignancy score. They train and validate on the NLST and external datasets, claiming super-human performance in predicting cancer

@Kaggle

@Kaggle

3/n. The same problem statement was @Kaggle’s Data Science Bowl 2017 challenge and provided a subset of NLST dataset too. 394 teams attempted the challenge and many have shown impressive results on the task. The evaluation metric was log-loss, however, and not AUC.

@kaggle

@kaggle

4/n. The @kaggle winners had a similar approach: 1) Lung mask segmentation trained on LUNA16, 2) Nodule detection trained on LIDC-IDRI, 3) Malignancy estimation using a noisy-OR combination of nodule outputs, trained on NLST. Ref: arxiv.org/abs/1711.08324

@GoogleAI

@GoogleAI

5/n. @GoogleAI’s implementation is probably better tuned and with recent advances: they use a RetinaNet with focal loss for detection, and use a 3D Inception model on a larger ROI to extract the nodule features for malignancy

@DrHughHarvey

@DrHughHarvey

6/n. Multiple papers have achieved similar performance statistics on the same task with the NLST dataset reporting over 95% sens/spec. @DrHughHarvey recently tweeted a list:

https://twitter.com/DrHughHarvey/status/1130886984162074624

@GoogleAI

@GoogleAI

7/n. @GoogleAI also claims to have carried out an external validation dataset on an uncertain number of patients: 1039, 1139 or 1739? Not sure how this got through the peer-review process. @NatureMedicine

https://twitter.com/Sutzpah/status/1135105039632982016

8/n. Is this a large-scale study? Not really - they had 27 cancer+ studies in the external validation dataset. Add another 86 cancer+ studies in the NLST test set, which is split from the training set.

9/n. Expecting CT imaging analysis to match biopsy status is a tall ask; so the GoogleAI team decided to compare with six general radiologists who were instructed to score the nodules using Lung-RADS criteria

@RadiologyACR

@RadiologyACR

10/n. Now it is important to note that the Lung-RADS is a patient management protocol that is used for standardized reporting and follow-up in screening programs. It is formalized by @RadiologyACR based on literature and data, but not exactly a data-driven model

11/n. The Lung-RADS takes into account the nodule size, texture, and growth to assign a risk category. It is a fairly simple model to implement, though there have been studies of inter-rater variability on its reporting. Ref: ncbi.nlm.nih.gov/pubmed/30066248

12/n. However, an ideal assessment for such an algorithm would have been with thoracic radiologists (not general radiologists) evaluating the malignancy independently based on their skills/ experience. Since they are the ones actually reporting LDCTs in the clinic

13/n. The paper goes on to prove that the DL-based model achieves superior performance in assigning an equivalent risk bucket than the Lung-RADS categories - achieving an AUC of 0.944 with biopsy status as the ground truth

14/n. On the external validation dataset with confusing sample sizes, 27 odd cancer+ cases, and no comparative Lung RADS scores, they showed an overall AUC of 0.955 with the biopsy

15/n. However, this is not the first time an ML model has been used to assess malignancy. The Vancouver Lung Cancer Risk Prediction Model (‘Vancouver’, ‘PanCan’ or ‘Brock’ model) was developed on the Pan-Canadian Lung Cancer study in 2013 and has been studied extensively.

16/n. When the Vancouver model was applied on a subset of the NLST dataset, it showed an AUC of 0.963! It has consistently shown AUC > 0.85 - 0.9 on the BCCA and Danish LCS Trial datasets too. Ref: ncbi.nlm.nih.gov/pubmed/27740906

17/n. What is noteworthy here is that the Vancouver model is a 9-parameter model that can be run on MS Excel, compared to a 1M+ parameter model requiring GPU clusters. 😂

18/n. The Vancouver model has been validated against human performance and was proven to be less accurate than thoracic radiologists. This is a good reference study to compare humans vs AI analysis. Ref: journal.chestnet.org/article/S0012-…

@GoogleAI

@GoogleAI

19/n. So @GoogleAI showed that it is possible to fit a generalizable function in an overparameterized space to model malignancy in lung cancer screening. But will radiologists use this black-box model in the clinic, over far simpler models with comparable performance?

20/n. More importantly, will patients be ok in learning that a million+ parameter algorithm is telling them that they can wait for 1yr for follow-up, or that they need to biopsy now? For reasons that can be visualized as bright spots on a grainy heatmap in a large 3D space?

21/n. Irrespective, this work will certainly enable experts to improve and upgrade their guidelines faster. Ex: Lung RADS recently added a new category for perifissural nodules as category 2, which technically could’ve been discovered earlier using AI

22/n. Before closing, it’s key to note that NLST only consists of pts 55-74yrs, 30pack-years of smoking history and w no comorbidities. Given this data bias, the model will likely not extend to routine CTs which may have infections, ILDs, or even predict metastatic nodules.

23/n. Being able to predict malignancy risk for incidental nodules remains a valuable problem. Ex: similar appearing calcified nodules could be caused by granulomatous infections, metastatic malignancy, or many more. Unfortunately, there aren’t any large open-source datasets

24/n. Finally, does this mean GoogleAI is entering the radiology industry? Unless they have a different data-sharing agreement, NIH only allows usage of NLST for research and not commercial purposes. Their proposal is public: biometry.nci.nih.gov/cdas/approved-…

@PredibleH

@PredibleH

25/25: Full disclosure: I co-founded @PredibleH, where we have been building lung nodule detection and malignancy estimation algorithms and these insights have been fueled by our experiences on the field.

Like this thread? Get email updates or save it to PDF!

Subscribe to Suthirth

This content may be removed anytime!

Try unrolling a thread yourself!

Trending hashtags

Like this thread? Get email updates or save it to PDF!

Subscribe to Suthirth

This content may be removed anytime!

Try unrolling a thread yourself!

Related threads

Trending hashtags

Did Thread Reader help you today?