My Authors
Read all threads
It took a natl holiday but finally get to sit down and try to digest this paper! So excited 🎉 been looking forward to this for days😁 and yesterday couldn't help but bring up @zerdeve's model-centric framework of scientific discovery (affectionately called it a "little people >
model" when explaining the idea of agent-based modelling and it stuck, sorry) and @djnavarro's Devil & Deep Blue Sea over wine w/ partner. Both got teary-eyed (shh) contemplating how this type of work feels so much more like "real" (meta?)science -- careful, nuanced, powerful >
and not shying away from controversy but also not going in with an agenda & huge ego. Refreshing. Also has anyone thought of reading those two papers in sequence as an exercise in epistemic humility & living with apparent contradiction? Just a thought. Ok back to the paper! >
Already good! @naps_and_wine you will like these parts ;) As someone who often naively believes there is an unmoving objective "truth" out there and doing science is trying to approach it, it's great to be reminded of my naivete every so often.>
This is the gold I came for! "We conclude that methodological reform 1st needs a mature theory of reproducibility to be able to identify whether *sufficient* conditions exist that may justify labeling reproducibility as a measure of true regularities." >
Who could have predicted that the criticism that we could use more formal modelling in psychology could be applied to meta-science / meta-psychology as well? 😈
Mmm, this will need some disgesting. But I got time, so let's jump into the appendices! >
DIGESTING, of course 🙄

Anyway, first boundary of my ignorance: reached! I think (?) I know what some of this means, but googling is not helping with the rest (e.g. that strange V, I don't think I've seen that before?). Oh well, must needs move along while pondering for the >
100th time that I really should do a MSc in stats. One day Vasco, one day.
Oh this I've seen before! 😈 Also good spot to plug Berna Devezer's great talk @interact_minds that goes into detail on this & includes an awesome introduction on the history of reproducibility going waay back :)
"This conditional independe of sequence of results ... implies that irrespective of whether a result is true or false, there is a true reproducibility rate of *any given result*, conditional on the properties of the study." -> determined by >
1. true model generating the data,
2. assumed model under which inference is performed,
3. methods with which inference is performed.
"In this sense, the true reproducibility rate is a parameter of the population of studies." Now that you put it like that, seems really obvious! >
some might say it "just" follows from your premises ;)

This is making me think about whether we can conceive of reproducibility of a result in other ways? E.g. how does the idea of multiverse analysis play into this? But I have nothing of substance to contribute right now and >
"I'm just asking questions" won't do. Forgive my musings. While I go branch off an earlier tweet for the funs
That reminds me! Could the highlighted part here be a nod to Briggs's "Uncertainty" (springer.com/gp/book/978331…)? Picked it up when @zerdeve tweeted about it a while back; it was much too difficult for me to follow all the arguments >
but I will not forget the lesson that THERE IS NO UNCONDITIONAL PROBABILITY.
That was fun! Time for a short break, imaginary audience
Yummy espresso

Now ready to keep going 🤓
Appendix 2 and Box 1 also make me face my ignorance. Turns out even espresso doesn't save you from lacking the background knowledge to understand careful arguments. Still, what I can take from this:
- it is probably easier than we think to obtain non-reproducible true results >
and reproducible false results
- ensuring the opposite requires necessary conditions (Box 1) that I doubt most studies fulfill. Eg: "If inference is performed under one assumed model, that model should correctly specify the true mechanism generating the data." Wait, so more and
covariates in my linear regression won't help? Gasp. Also if you came this far and still haven't read Between the devil & the deep blue sea (link.springer.com/article/10.100…) go do that. I'll wait. Then we can bask in our own ignorance together :)

Another example: >
"The sample of which inference is performed is representative of the population from which it is drawn." All I can say is good luck with that 🙏

Anyhoo, back to the main body. We're at section 1.2, end of p. 3, in case you were wondering.
I must smile when reading this bit. You see, @EJWagenmakers is used to this kind of jab, as revealed in his well thought-out reponse to *that preprint* that so many people were "discussing" on here (see the EJ's blog here bayesianspectacles.org/a-breakdown-of…) and the 2nd highlighted quote 👇
So much to chew on here. I leave you with two poignant quotes:
1. "However, statistics’ ability to quantify uncertainty and inform decision making does not guarantee that we will be able to correctly specify our scientific model."
2. 👇
Love me a good footnote. I think I will follow up on these discussions at some point, thank you very much :)
Oof. Section 1.3 (until about line 280) takes you on a journey illustrating the nasty consequences of a blind pursuit of reproducibility, with a punch right at the end: "The mechanistic explanation of this process is that reproducibility-as-a-criterion can be optimized >
by the researcher independently of the underlying truth of their hypothesis." Ouch.

This is a good moment to recommend listening to Beethoven's Piano Concerto No 3 in C minor while reading. () Epic, melancholic, precise, like this paper.
Next the authors go broader again , and I can only assume @djnavarro played a big role in writing this. At the very least I am reminded of Paths in Strange Spaces, Part II (2nd pic, djnavarro.net/post/paths-in-…), which I also highly recommend anyone reading this thread to check out
And so we end this section with our confidence in Claim 1 (Reproducibility is the cornerstone of, or a demarcation criterion for, science) severely shaken. Quite pleasurable if you enjoy a feeling like all you took for granted has been destroyed. Ironically Zimerman wants to >
move on to the Rondo Allegro. I can't right now, Beethoven. I'm not allegro. A break is in order before exploring how using data more than once may *not* invalidate statistical inference. Wondering what will remain of me at the end of this.
Some chores and a delicious serving of homemade kefir and espresso with a dash of cinnamon, turmeric, and cocoa powder later...
While I let Zimerman finish the Rondo Allegro, I thought I'd start the next reading session with some reflection on the prior knowledge (or lack thereof) I bring to this claim. At the risk of just highlighting my ignorance...

When I think of "using data more than once >
invalidates statistical inference", this is what comes to mind:

First, I assume we're using NHST to infer whether some effect exists. E.g., in an experiment, we may want to infer whether some value is different in condition A vs condition B. If we first explore the data, e.g >
by looking at the means, that could inform the statistical test we choose. If we see that the mean in cond. A is higher than B, then we might test that difference in means with a one-sided test. The counterfactual is that, had we not looked at the means, we might have used a >
two-sided test. Are we then invalidating our p-value? All I've read on this suggests that we are. But then again who said using p < alpha as a decision strategy is an appropriate way to tackle our scientific question of interest?

The other situation that comes to mind is when >
fitting more complicated models. Say if I look at the relationship between two variables and can distinguish some U shape. Then I use a regression with a quadratic term instead of a linear regression (which I would have otherwise used) and perform a test. Is my p-value invalid >
now, because I selected a model based on looking at the data?

I realize the above is a rather unsophisticated way of looking at this question, but it is more or less what is swimming in my mind when I read Claim 2. Let's now see what the experts have to say >
while being mesmerized at Barenboim's rendition of Beethoven's Waldstein sonata 😍 ()
Ok, so more or less what I was thinking
... and there's the rub, and perhaps the main qualitative contribution I see throughout the paper: "These verbally stated terms [eg "double-dipping"] are ambiguous and create a confusion that is non-existent in statistical theory." and a little later >
"Conditional inferences are statistically valid when their interpretation si properly conditioned on the information extracted from the observed data, which are sufficient for model parameters." >
It's as if focusing too much on some 'long term error rate' displaces the emphasis away from where it truly belongs: making valid inferences. Luckily, it seems that's where we're going next!
Casually dropping truth bombs in footnotes
This is taking me quite a while because I am not well-versed enough in statistical theory to follow the formal arguments. However, I think even I can take some nuggets from from section:
- we ought to be careful in "exposing" issues in a discipline we don't understand >
well enough. E.g., the authors show how my previously stated concern with choosing a test based on observing means (or choosing to conduct a test based on an observed pattern) can actually be addressed, see pics
- strenghtening the previous point, a final quote from this subsection: "The key to successfully implement these solutions is a good understanding of statistical theory and a careful interpretation of results under clearly stated assumptions."
~ and we are moving on, aptly, I think, to Pathétique () ~
Once more for the people in the back: "All well-established statistical procedures deliver their claims when their assumptions are satisfied."
Now for the dreaded truth about preregistration. Section 2.2. and Box 3 build on the previous results to build an argument for why prereg is not going to save us. Some highlights:
- the distinction between "confirmatory" and "exploratory" tests has no meaning in statistical >
theory (more on that later)
- prereg does not solve the problem of looking at data multiple times, since there is no such problem
- prereg cannot, in itself, guarantee a valid statistical inference
- valid inference remains valid without prereg
- a silver lining? (see pic)
and that's all for today, folks! This has been lovely, and I'll finish the paper over the weekend. Now will change mode -- very soon it will be Fancy Portuguese Wine o'clock 😍
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Vasco Brazão

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!