Each day until publication at 5pm ET on 14th april, I’ll give some brief about the paper. The trial was a two arm double blind randomised controlled trial in 59 people with major depressive disorder, 30 in what we call the psilocybin group and 29 in the other. Obvs we wanted 60
But circumstances conspired against that. Unrelated, due to covid lockdown in the uk (end of March 2020), we also had to cancel the second of two psilocybin dosing sessions for the final 3 patients, 2 of which it transpired for from the psilocybin condition. I won’t reveal
Results until they are published but will merely brief on design. In the psilocybin condition, there were x2 25mg dosing sessions separated by 3 weeks. Prep, guide supervision of sessions and post dose integration was done as standard. In addition, patients took placebo capsules
Daily for 6 weeks: one capsule per day for the first 3 weeks and two capsules per day for the final 3 weeks. In the escitalopram group, patients had x2 1mg dosing sessions, separated by 3 weeks. In addition, they took escitalopram daily for 6 weeks, 1 capsule of 10mg for the
First 3 weeks and 2 capsules (20mg total) for the final 3 weeks. The main aim of the study was to compare the side effect profile and action of the 2 treatments. Our primary hypothesis was that the mechanisms would differ. More tomorrow...
We also performed fMRI/MRI in the trial: at baseline and at the 6 week endpoint. In fact, the pre-reg primary outcome was on these fmri data with the QIDS-SR-16 as the primary efficacy outcome. It may create some confusion that our primary hypotheses were for no difference on
QIDS however, but rather separation on well-being and brain mechanisms. We don’t report on brain mechanisms in the paper (under analysis) but do show multiple secondary efficacy outcomes, particularly in the supplementary appendix, which I IMPLORE YOU TO READ!
More tomorrow...
You’ll see we have a lot of outcomes eg: qids, bdi, ham-d, madrs for depression, wemwbs and flourishing scale for well-being, work and social functioning scale, suicidality scale, stai for anxiety, leis for emotional reponsivity, BEAQ for avoidance and a self constructed scale
for assessing specific side effects that have been associated with psychedelics and ssris, including psychotic symptoms (ie paranoia and mania), hppd symptoms, emotional blunting, restlessness, drowsiness and more. We assess response & remission rates for all depression scales
in the standard way and we also included a 24 hr version of the qids to look at potential rapid antidepressant action. We do not report the imaging or behavioural outcomes so won’t describe here. More tomorrow...
We used local research networks, word of mouth with local clinicians and MQ to aid recruitment but ultimately recruitment required the patient to approach us. I won’t reveal the nature of sample until weds evening as this is considered results. Procedure was telephone screening
During which a ham-d was done and psych interview to determine eligibility. These calls were v thorough and done by our amazing team of psychiatrists and psychologists. Ham-d score had to be 17+ to be eligible. If patients were medicated, discontinuation has to be managed with
agreement from patient’s clinician. Discontinuation of meds or psychotherapy happened 2 or 3 weeks prior to the trial. After enrolment, visit 1 was a baseline assessment and psychological prep session, there was also pre dose MR. The first of two dosing sessions took place the
next day, visit 2. At end of this dosing day ps received a bottle of capsules for the 6 week course of either placebo or escitalopram. Capsules were taken every morning each day for the subsequent 6 weeks. Visit 3 was 1 day post dosing day 1 & was the first integration
Tomorrow I’ll cover our analyses. Much of our results are reported with 95% confidence intervals, a range of values in which we are 95%+ sure population avg lies. If you’re unfamiliar with CIs, I’d advise you look them up as are impt for u’standing our findings. More tomorrow...
When it comes to contrasting between groups, as in the 2 groups in our trial, you will see 95% CIs stated. If the values do not cross zero eg if they range from say 1-10, it means we can be at least 95% confident that the difference is ‘real’ meaning it is not a fluke of chance
But rather is reflective of a real difference that exists even if you were to take another sample from the population or to extend your sample. This vid touches on at end:
More tomorrow
A more precise way of referring to a 95% CI is to say that we are 95%+ confident that the population avg lies within the given range, so if that range doesn't cross zero, e.g. if the range is -2 to -15, then we are 95%+ confident that if we sampled an entire population, the
avg would lies in that range. CIs are related to p values: if the 95% CIs don't cross zero, you know you will have a p value of < 0.05, the agreed standard threshold to conclude statistical significance; it means you are over 95% confident in your between-group difference.
When you run a trial, it is correct practice to define a primary hypothesis, your main prediction. We do this because otherwise you could make loads of predictions, casting your net far and wide and so pull up fish other than the one you're really predicting to be out there
One issue here however, is often the fish are closely related, i.e., they correlate with each other, e.g., even though you predicted haddock, you pull up cod, but cod and haddock tend to appear together and are part of the same genus. We might say the same about anxiety and
depression measures e.g., or more than one different depression measurement scale. To address the fact that fish/symptoms often appear together, we could treat them as one inter-related thing & then run a rest on whether the post treatment change is different between them.
If you don't do this & you're casting a wide net trawling for positive findings, then you should really tighten your threshold for concluding 'significance'. You do this because otherwise you might pull up what we call 'false positives' i.e. things that appear significant but are
are the product of casting a wide net. The proper procedure when casting a wide net/doing lots of tests, is to correct for this by tightening the threshold needed to conclude significance. This is call 'correcting for multiple comparisons'. Now let's rewind a bit...
Before beginning a trial, it is best practice to very clearly state the statistical analyses you will do at the end of the trial. It is a regret that we did not do this thoroughly enough. We did not state that we would correct for multiple comparisons nor collapse interrelated
measures into 1 or 2 factors. As a result, a fair but potentially conservative view is to say: "you're doing too many tests, you need to correct for casting such a wide net & if you don't, then you can't conclude your results as significant as they might contain false positives"
A fair counter to this is to either do the correction for multiple testing or show that the measures inter-relate and so can be reduced to 1 or 2 factors. We intend to do both steps in the future & invite others to do the same but as no pre-reg on how we'd do this
in this main analysis, we don't apply or report these (actually very appropriate) correction steps in the paper. As a result, the charge of casting too wide a net can be made. This may look conservative, & it is, but it is also consistent with good practice. More tomorrow...
... ok, so more today. Re CIs, I'd also like to explain a certain kind of plot, called a forest plot. Below is an example of a forest plot. The midline lies on zero. The horizontal bars are 95% CIs. Remember the range must not cross zero for the between condition difference to be
considered statistically significant at a threshold of 95%+ confidence. The CIs shown in green would meet the criteria of being statistically significant in the sense that if you were able to run the study many times, you would be >95% confident of finding the population avg
within the green range, and so favouring one particular treatment group. Each row in this plot reflects a measure e.g. of depressive symptom severity. The red triangle and blue dots on the left reflect the averages on each measure, red for one group and blue for another. The CIs
on the right reflect confidence of a true difference between the groups. Again, if the bars do not cross zero, the scores are considered significantly different between the groups. In this case, you can differentiate those that were significant by the colours of the bars: black=
non significant and green = significant. More later...
What you will find in the paper publishing online at 5pm ET/10pm UK tomorrow is a very conservative description of the trial results. Please do bear this in mind. Again, I strongly encourage readers to check out the supplementary appendix. This contains some figures and tables
we would have preferred to have been made available to readers within the main paper but the journal moved them to the appendix. Some of the framing of the results are not our words but rather those of the paper's handling editor. It is for these reasons I'm emphasising you read
the supplementary appendix. Look at the values. Values are just values, free of any narrative. View those values and see what you think. We tried to be extremely thorough in terms of caveats and limitations in the discussion. I sense people might think there is narrative slant
against psilocybin but bear in mind that NEJM is the top medical journal, and they require the highest levels of rigour in reporting and process. I am pleased we have been so thorough. As for the framing of the results, conservative is far better than promotional - BUT VIEW FOR
yourselves & interpret how you see fit. Read the supplementary appendix and remind yourselves about 95% CIs. Remember, if they do not cross zero, the range implies that you can feel >95% confident that the population difference lies in that range, or in other words, the between
group difference is reliable. The paper will splash at 5pm ET/10pm UK tomorrow. I might say a little bit more later or tomorrow...
When published, look within the supplementary appendix & pay particular attention to Table S1 and Figure S4, and remember my little lesson about 95% CIs... More later...
The Vice article is pretty fair. Big up @shayla__love ; however, this needs addressing: "the people in the study had already been the subjects of previous published work: an open label trial from 2017, and a double blind randomized control trial (DB-RCT) from 2021 that was
published in the New England Journal of Medicine." - It could be read to imply same patients in both trials which isn't accurate. The samples are entirely independent. Another thing: I appreciate the effort to explain the interaction test to readers but we address that test in
both our responses, which helps qualify an understanding of it further & a mis-understanding of its importance/role by Doss et al. Readers should understand therefore that we too agree with statements like this: "there wasn’t enough of a difference in the change in modularity..
Thanks to those who've recently sent sweet words of support, also to the friends who were honest with me about "improvements" & for the useful reflections this has invited.
There will be a Vice article appearing in the next day or two covering the topic. I engaged with the author & asked they print or link to our full response, which they've agreed to, which I respect.
I invite you to read that full response. You'll find it consistent with the preprint response - minus the defensiveness that triggered the reference to Doss et al.'s motivations & the reference to personal credentials, the former being 'blog style' and unacademic, & the latter
One of my least favourite things: professional spats. I also dislike online trolling and echo-chambering via social media driving polarization. I like pluralism and wisdom teaching & am keen to step back from this forum for a while and focus more on family, mindfulness & metta
I felt I had to write a response to what I regarded as an unfair and inaccurate critique of a recent paper of mine, led by the super recent PhD graduate, @neuroDaws, alongside my wise & valued mentor of many years, @ProfDavidNutt . I value critique & so thank those who've
critiqued our paper in @NatureMedicine . I haven't enjoyed inaccurate portrayals & some some of the bad manners I've seen from peers, but be that as it is, some learning has happened. For those interested, here is our response to the specific critique I'm speaking of:
Most psychedelic research has examined average outcomes and compared them across time and conditions. One consequence of this approach is it may neglect the relevance of extreme cases and values.
'Extreme value analysis' is a different approach, that endeavours to examine potentially salience extreme values or cases, which despite their relative rarity, are nevertheless important.
This approach motivates this present study, which seeks to focus the microscope on prolonged negative psychological responses to psychedelics. Despite their relative rarity, such responses may be important - and therefore, deserve not to be 'averaged out' of data and conclusions.