I've never really answered the critics of @BetterSkeptics first challenge in one place, so I should probably write this down so I can refer people to it in the future. 🧵
The criticism that has been coming our way should have been somewhat expected, given that in any sensemaking exercise where there's significant disagreement, someone will feel like their side was not fairly represented. That said, it may well be that they were right. Let's see:
Before we get into it, it's important to explain that the main audience for the result of the challenge was myself. Having found the Quillette criticism to be of poor quality, I wanted to see what the *strongest* criticism would possibly be, as I don't want to believe falsehoods.
The project was never intended to prove anything to anyone else, especially people with pre-determined opinions. A criticism coming at me these days, is that I'm some sort of DarkHorse "superfan".
Admittedly, I hold them in high esteem, but "superfan" is a derrogatory statement, intended to attack my intellectual honesty.
(3) 2 of 3 referees held anti-DarkHorse bias if anything. I believe they have all done their best to handle the challenge with integrity, but a pro-DH bias they did not have.
I do appreciate the huge amount of effort the referees put in, and the emotional pressure that they were under throughout the process. The whole thing was wild!
There have been, however, some arguments as to the mechanism of the challenge itself. I'll focus on 3 in particular:
(1) The referees were not experts in the field.
Given that the mechanism of the challenge such that the referees had to do none of their own research but evaluate others' claims for *logical consistency*, I am not really sure what to make of this claim.
(2) There was a limit to the number of claims that could be submitted.
This was proposed by the refs to avoid spam. We could have done it better, but the person who made this argument the loudest only got 1 claim through the whole process, so maybe the issue wasn't the limit.
(3) The scoring system was biased against finding false claims
This one sounds plausible, so let's dig in: The standard that we used for the Ground Truth Challenge was falsification, inspired by Popper. As in, the challenge for the participants was to show false claims.
The reason is that pretending to be "fact checkers" in the middle of a pandemic would be silly, when the science is being made in real-time. What we felt we could do is help identify falsehoods, as I did in my threads on the Sam Harris podcast and the Quillette article.
The referees had a scale from 0 to 5 for each claim. It's important to understand that this was a *concession* during the design process, to allow for some "not definitely false but perhaps highly likely to be false" claims to be identified too. If we wanted to be stricter,
...we would have had a "Definitely disproven" and "Not definitely disproven" scale, as per strict falsificationism. Instead, we allowed for "high likelihood" claims to be validated, understanding that not everything will meet the bar. What the critics said, is that any level of..
..likelihood of falsity, should be enough to find fault with the podcasts we examined. That is, in brief, madness. See my threads on the Quillette article, as well as on the Sam Harris podcast: I had no issue finding black-and-white false claims. This isn't hard if they're there.
What I found most interesting is that the referees themselves each felt that we found way too few things wrong with the material we examined. However, and this is crucial, they did not agree on which those claims were!
The final report produced, is, in my opinion, pretty high quality, even though I do not agree 100% with everything in it. In fact, this was the objective: to reach some level of consensus that would not necessarily make everyone happy, but would be better than what we had before.
The persistent critique of the project remains: that it makes it "too hard" to prove fault with the material, even though no specific examples are cited and no particular mechanistic explanation is offered.
I do still think that if we had examined the Sam Harris podcast with Eric Topol or other such material, we would have found **many many** false claims, and that the process would not be in the way. I suppose one day we'll try it on something else and see what happens.
If there's a criticism of the project that does have legs, it's that we crowdsourced claims against Dark Horse that were strip-mined by bad faith actors to make their own critiques of Dark Horse more plausible, even if the claims themselves were rejected from the process.
Even with that in mind however, I'm still very proud that we accomplished our objective: take divergent points of view and coalesce them into a high-quality result. We learned a lot about the good side and the bad side of sensemaking, and what's possible in the era of Twitter.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
There's "The mechanisms of action of Ivermectin against SARS-CoV-2: An evidence-based clinical review article" by Asiya Kamber Zaidi & Puya Dehgani-Mobaraki that was pulled by the editor... nature.com/articles/s4142…
There's Tess Lawrie's meta-analysis which got rejected by the Lancet AFTER passing peer review, costing months of delay until it passed review again at a different journal. The interview video where she described the events has been pulled down...
A big reason pulling me into this whole debate around the pandemic is being baffled by behaviors of people like Sam. Having had high esteem for him, it was important to me to "unpack" our disagreement, to make sure I haven't lost my mind.
I started by doing a comment thread on his podcast with Eric Topol, which, as you will discover, left me deeply unsatisfied.
That was, it turns out, a bad start, as he ended up calling me out in his AMA 17, never having spoken to me before and ended up completely straw-manning my argument. Completely out of character for the Sam Harris of old.
I want to investigate the Hector Carvallo situation. This 🧵is likely to become overlong and meandering as I'll try to figure things out in real-time, so if you want to help please tag along, and if you want "just the facts" best to just mute this one and wait for the summary.
I'm aware of 3 main articles I'll try to comb through, and I've read none of them closely. Please comment with other resources.
I'm realizing that the insitence on Randomized Controlled Trials (RCTs) as the only evidence that matters when deciding if a medicine/supplement should be used, structurally biases against generics, over-the-counter meds/supplements, and those with few side-effects. Here's why:🧵
The first class of problems has to do with wide availability when the subject of effectiveness on a new disease is raised.
1. Cheap OTC generics with few side-effects get used a lot in an emergency, where word of mouth spreads, making it much harder to form a control group.
2. These substances, when there's a suspicion they can be effective in an important disease, will spark many studies all over the world. This means there will be many small trials, of varying protocol/dosage and study quality. This is a big problem for two reasons: