1) they compared two different metrics (ROC-AUC vs ROC area) as if they were the same
2) they used average human performance
3) they seemed to cheat when picking an operating point for the model
Each biases in favour of the model.
But I also understand that journals have processes to follow, so I did what they asked.
This boggles my mind.
Is there any world where this does not need to be fixed within a week of it being pointed out? No letters needed.
None of the beginners/trainees "outperformed" the model.
Of the non-specialised group, 1 out of 30 outperformed.
I was worried, but wasn't certain, that they cheated to find the model specificity. By cheating, I mean they used test results to select the operating point. This is a big no-no, because in clinical practice you don't have these results.
IMO, the 6 month long process has done nothing useful, except provide a paywalled citation to the original paper. A paper that should have been fixed in peer review remains unchanged, and is lauded as a top paper of the year.
Instead, I just feel like the time I spent was wasted.