1st, definitions matter. "Superhuman" forecasting should mean beating top Metaculus forecasters, or the overall crowd, not a random sample of them like this paper.
You wouldn’t say a coding AI was “superhuman” if it was top 50% of a coding competition. (2/7)
2nd, methods matter. “Retroactive forecasting”, e.g. predicting the past, requires a robust training window cutoff and hiding present web information. Neither are easy, and a quick check shows data leakage is happening in these papers, see . (3/7) lesswrong.com/posts/4kuXNhPf…