Noah Haber Profile picture
Apr 21, 2021 65 tweets 16 min read Read on X
New project on causal language and claims, and I want you to see how everything goes down live, to a mind-boggling level of transparency.

That includes live public links to all the major documents as they are being written, live discussion on major decisions, etc.

HERE WE GO!
Worth noting: this is the second time I've tried this kind of public transparency; the previous paper got canned due to COVID-related things.

NEW STUDY TIME!

Here's the idea (at the moment, anyway): health research has a very complicated relationship with "causal" language.
There is a semi-ubiquitous standard that if your study isn't the right method or isn't "good enough" to for causal estimation, you shouldn't use the word cause, but instead just say things are association/correlated/whatever, and you're good to go.

This is ... problematic.
Lots of potential rants here for why, but suffice to say this standard creates all kinds of issues with study strength, communication, and usefulness. This is a problem I've been working on for years.

But how common is this, and is it a big problem?
One symptom of this disconnect is that a lot of papers may make a lot of action claims about their studies that would require causal estimation, but according to the language, are "just association."

So, what do we want to know?
1) How do *typical* journal publications phrase the relationships between exposures and outcomes?

What actual words are used (correlate, effect on, associate, cause, etc)?

With what modifiers (may be, strongly, etc)?

How common is "just say association"?
2) Do the claims and action implications made in the paper imply or require a causal estimate?

E.g. do the implications from the paper suggesting doing more, less, or the same amount of X in order to increase or decrease Y?

That's implicitly causal in nature.
So, the plan:

1) Take a giant randomly selected, screened sample of X vs Y-type articles in the health literature.

2) Recruit a giant multidisciplinary team of awesome people

3) Have them determine what the phrases used and what the claims made are, based on guidance.

Easy!*
* narrator: this is not easy, but at least its fun!**

** narrator: fun-ish.

First thing that needs to be done is developing a rough protocol.

My task of the day is making a messy terrible outline of what I think this should look like.
Good news is that I've had enough proposals in this arena that I can stitch it together from scraps leftover from ~4 years of failed grant proposals and cancelled projects that this shouldn't be that hard, right?

RIGHT???
Some time this afternoon, I'll open a blank document for the protocol draft, and share the link with the world, so you can watch / comment / suggest everything as I go, and can see just how terrible I am at writing.

It's gonna be great. Or something.
And here we go!

Link to document, feel free to leave comments and whatnot any time. This will be fully open docs.google.com/document/d/1dG…

And as a bonus from now to about 5ish, I'm gonna stream it all on Twitch, come join!: twitch.tv/noahhaber
Day 1: Got a decent outline of the protocol draft. Goal is to get a full shitty draft for tomorrow, and send around for potential protocol co-authors / revisions.

I want a full, presentable protocol by the end of the month, because this is going to be an aggressive timeline.
Bad protocol chalkboard draft #1 written and done.

Now in get-core-team-together mode, to be followed shortly by absolutely-massacre-original-draft-and-like-a-phoenix-a-decent-draft-will-be-reborn-from-the-ashes mode

docs.google.com/document/d/1dG…
Core team is now being constructed, people getting invited to collaborate.

Fun part of this is that I am under a small time crunch, since my fellowship ends August 31, with no clear employment after that.

4 month hard deadline, here we go!
Update: two weeks later, we have a full core team, and the protocol is well on its way.

Things, they are happening. 3.5 months to go.

docs.google.com/document/d/1dG…
Status update: protocol is getting close to done, protocol coauthor team finalized, reviewers are being recruited and we're having one of several intro meetings tomorrow morning.

Good thing I definitely for sure planned ahead and made slides.

docs.google.com/presentation/d…
Hypertransparency part 2:

The.

Entire.

Project.

Folder.

drive.google.com/drive/folders/…
Only thing that's not public is stuff that contains personal information; everything else is public.
Welp, things are going. Here's where we're at:
- Team recruited and on Slack
- Currently putting the final touches on the protocol
- Wrote/ran the search code
- Team divided into screeners and review tool piloters
- Meetings scheduled for the training sessions
This is definitely a work weekend for me, lots of moving parts and administration for which I am the bottleneck.

Good news is that the team is AWESOME. Particular shout out to @SarahWieten for taking a whole bunch of responsibility (including boring stuff).
@SarahWieten Relatedly, we had so much interest in this study that we had to narrow down a list of 150+ people down to a 50ish person final team.

Decisions were based on a lot of things, but notably maximimally diverse representation among qualified people.
@SarahWieten Which is to say we had to say no to a whole lot of people who are super awesome and super qualified.

If I had known how much interest there would be, possible I could have redesigned things to work with a bigger team. But alas.
@SarahWieten Hard deadline looms though. We've already used up just about the entire buffer already (granted, planning is the "high risk of delays" stage).

Doing a first-of-its-kind project with a massive team and lots of unknown unknowns is a particularly Noah style of bad idea.
Protocol pre-registered, screening process and review tool piloting start roughly simultaneously tomorrow.

Feels like things are a touch more rushed than I would like, but so it goes. Good news: pre-registration is not a stone tablet. If we need to make changes, we'll make them.
For whatever reason, the screening is always always always the most chaotic part of these projects.

Hiccups abounded, but screening is well underway (albeit a touch behind schedule due to said hiccups).

Main review training starts on Monday!
One hiccup was just a straight up coding error that was my fault, but others were more about the sampling and screening design due to some unexpected interactions. Lessons learned.

Pretty much inevitable with a first-of-its-kind sortof project, but can be frustrating.
While the screening's been going on, @SarahWieten has been leading a team to pilot the review tool and giving really incredibly helpful suggestions.

The many-commenters model is a lot of work for sure, but it absolutely makes a HUGE difference to the end product.
Really really looking forward to the main review phase starting (after the inevitable round of fires have been put out, of course).

I've been going nonstop on this project for a few weeks now. Will be nice to take a break.
Inching ever closer to launching the main review phase, currently desperately putting the final touches on a dozen things before we commit.

Side note: I think I've worked harder on this over the last few weeks than I've worked on just about anything.
A brief recap of the last 2 weeks:
Estimated person-time for the main review alone is just a hair over 1,000 person hours between ~50 coauthor reviewers.

That's not even counting the screening and piloting process, design, admin, analysis/writing., etc.

This thing is a MONSTER.
AND WE'RE OFF! Data collection has officially started for the main review.

I've been working on getting to this moment for YEARS and it's awesome to see it happening
Progress is happening
Primary review phase wraps up (ish) today! Next week is the arbitration review phase, plus a bit of extra ratings and such.

But the end of the data collection phase is in sight.

Cool.
One side effect of this study is that a lot of extremely smart people are seeing what a reasonably representative random sample of the high-impact medical / epi journal literature actually looks like.

Reactions have been pretty interesting.
By request, I am doing an improvised stream of how the back end of all this works on Thursday, July 22 at 10am eastern.

How do you organize the code and interface of a complex multi-phase, 50+ person 1k+ article 3,000+ reviews study?

DM for pwd.

stanford.zoom.us/j/92540957829
Be prepared for a bizarre combination of good design and some hacky nonsense.

All code and almost* the entire file infrastructure are fully public if you want to poke around.

drive.google.com/drive/folders/…

* files containing personal info are private (some are needed for code to run)
Major milestones!

1) Arbitration round reviews are wrapping up. One more piece of data collection next week and some cleanup work to do, but we're so, so close!

2) I've made an analysis coding file!

3) The manuscript is getting written!
docs.google.com/document/d/1iR…

EEEEEP!!!! Image
These mega collabo projects can be monstrous, but good golly it's magical sometimes.

I was short on time to write, so I sent a quick message to the group to see if someone could handle the intro, and BOOM @dingding_peng wrote an awesome 1st draft, WAY better than I would have.
MAJOR MAJOR MILESTONE hit last night:

100% of article reviews completed!

Still so, so much left to do, but this is the point at which we officially have enough data to meet our primary analysis goals. Image
Going to reflect on a few things to getting here.

Firstly, the screening part turned out to be the most chaotic phase, and the main review went mostly fairly smoothly.

Screening is the point where you have a logistically hard proble, the least info, and untuned systems.
It was EXTRA chaotic due to the requirement of accepting the same number of articles per journal as a stopping, with wildly different acceptance rates per journal, with feedback loops for screener assignments.

Doing that involved a lot of pain and chaos. Do not recommend.
I also messed up and created some extra work due to a very stupid code bug that resulted in excluding two very important journals, which was not caught until late in the process.

Fortunately, the system was built such that fixing it wasn't a huge problem. But still.
Then there's just the general chaos of doing a complicated and way out of ordinary project, with very unusual framing and methods, requires constant tweaking and changes, etc.

Doing something weird is always tough.
And then there's the fact that this project involves carefully coordinating, training, and synchronizing 50 (!!!) people, where everything needs to mesh at precise times and multiple phases, and any one unmeshing issue throws the whole thing out of whack.
As before, the only thing that isn't public is personal info, so I can't and won't talk about specifics.

But some tough situations arose, some unavoidable, others perhaps avoidable.

By and large though, the crew is/was ASTOUNDINGLY amazing, and my favorite part of these things.
Now we're on the cleanup phase, where there is a tough balance to be hard. I have to maintain three conflicting goals:

1) Data quality
2) Being a reasonably neutral party to avoid over-influencing reviewer decisions
and 3) Timelines

Can't get all 3 perfectly.
Have a bit more data collection to do, but the next phases are the analysis and manuscript writing phases.

And because I am me, I am going to do this the hard way, with hypertransparency engaged.

That means everyone can see all the not-so-pretty parts of the sausage making.
And so, without further ado, some public links!

The manuscript is being written here (currently public comments off): docs.google.com/document/d/1iR…

The analysis code can be found in the code folder, here: drive.google.com/drive/folders/…

I know that I should be using git. Next time.
AHHHHH I JUST WANT TO GET TO SHARE THE RESULTS AND THE DATA WITH EVERYONEEEEEE
Final day of data collection today. The results are super super super cool.

Also reminder: everything is being done EXTREMELY openly, including the manuscript AS IT IS BEING WRITTEN.

docs.google.com/document/d/1iR…
jeez. we did this. Image
For a sense of scale, what you see in that chart was the work of 49 people across the world, carefully synced and coordinated, with a complex multiphase process, using a first of its kind guidance and review...

In *42 days* from first screen to last data collected.
I am looking forward to never working this hard ever again.

But no rest yet.

Because I have 28 days left of my fellowship to get this written and submitted.
The results section is being written, figures and statistics are being dropped, come check it out!

docs.google.com/document/d/1iR…
The "big" result and data are being dropped and written right now.

To what degree does the strength of causal implications in the sentence linking exposure to outcome match the causal implication of action recommendations (i.e. what the authors say you should *do* with the data)
Nearly done writing up a first draft of the results.

Also, just drafted a nearly 2 page document detailing changes from the original protocol, of which there were many.

Doing something new and weird means running into unexpected weird problems, and plans change.
Big one was that we ended up using a much more direct and context-sensitive measure of linking language causal strength, scrapping the original (over-complex and probably worse in every way) assignment and rating process.

Preregistration is SUPER useful, but not a stone table,
Aaaaand first (bad) draft of the results section is written. On to the discussion section this week.

And boy howdy what a discussion section it's gonna be. I tend to think the results are pretty damning (including in some ways that surprised me).
Now first bad draft of the Discussion!

I expect most of this to get rewritten a few times over, but the first bad draft is the hardest part.

Entering the phase where 90% of the paper is done.

docs.google.com/document/d/1iR… Image
Now uploading the final datasets and (not so final) code to the OSF repository.

Everything's always been open and accessible via Google Drive, but having it all on an open science repository is MUCH nicer and more reliable.

osf.io/jtdaz/

Thanks @OSFramework!
@OSFramework I always find this stage of a paper to be tough. We know what the results are and what we want to say. The big stuff is done; we're 95% of the way there.

But there are a thousand small tasks that make up the other 95%.
@OSFramework To make a woodworking analogy: all parts are built and more or less assembled.

Everything else from here is sanding, finishing, and getting it installed.

There's just so, so much sanding.
One REALLY tough thing in this paper is just how much tiptoeing we have to do for internal consistency in how we describe things.

In our case, we can't merely "just use the right words," we have to make DAMN sure that we also don't make any possible inappropriate implication.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Noah Haber

Noah Haber Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @NoahHaber

Mar 1, 2022
"DAG With Omitted Objects Displayed (DAGWOOD)" is out in Annals of Epi!

What is DAGWOOD?

A framework?
A method for revealing and reviewing causal inference model assumptions?
A tool for model building?
A statement on epistemic humility?

Answer: Yes.

doi.org/10.1016/j.anne…
This weird paper could only be brought to you by the weird collective minds of @anecdatally, @SarahWieten, @BreskinEpi, and I.

But before I run through it, an acknowledgement:

It's March 1, 2022, and events in Ukraine and across the globe far overshadow any celebration here.
@anecdatally @SarahWieten @BreskinEpi The problem:

Folks often say that DAGs make our causal inference assumptions explicit. But that's only kinda true

The biggest assumptions in a DAG aren't actually IN the DAG; they're in what we assume ISN'T in the DAG. It's all the stuff that's hidden in the white space.
Read 23 tweets
Feb 21, 2022
Time to make it official: short of some unbelievably unlikely circumstances, my academic career is over.

I have officially quit/failed/torpedoed/given up hope on/been failed by the academic system and a career within it.
To be honest, I am angry about it, and have been for years. Enough so that I took a moonshot a few years ago to do something different that might change things or fail trying, publicly.

I could afford to fail since I have unusually awesome outside options.

And here we are.
Who knows what combination of things did me in; incredibly unlucky timing, not fitting in boxes, less "productivity," lack of talent, etc.

In the end, I was rejected from 100% of my TT job and major grant applications.

Always had support from people, but not institutions.
Read 21 tweets
Aug 30, 2021
Causal language study is now up on medRxiv!

medrxiv.org/content/10.110…

Ever wondered what words are commonly used to link exposures and outcomes in health/med/epi studies? How strongly language implies causality? How strongly studies hint at causality in other ways?

READ ON!
Health/med/epi studies commonly avoid using "causal" language for non-RCTs to link exposures and outcomes, under the assumption that ""non-causal"" language is more ""careful.""

But this gets murky, particularly if we want to inform causal q's but use "non-causal" language.
To find answers, and we did a kinda bonkers thing:

GIANT MEGA INTERDISCIPLANARY COLLABORATION LANGUAGE REVIEW

As if that wasn't enough, we also tried to push the boundaries on open science, in hyper transparency and public engagement mode.

Read 27 tweets
Aug 17, 2021
I've done a fair bit of generating simulated data for teaching exercises, methodological demonstrations, etc.

It's really, really hard to make simulated data look "real," and it usually doesn't take much to see it.

That pops up in a lot of these cases.
Granted, we only see the ones that get caught, so "better" frauds are harder to see.

But I think people don't appreciate just how hard it is to make simulated data that don't have an obvious tell, usually because somethig is "too clean" (e.g. the uniform distribution here).
At some point, it's just easier to actually collect the data for real.

BUT.

The ones that I think are going to be particularly hard to catch are the ones that are *mostly* real but fudged a little haphazardly.

If I had to guess, this is probably more common.
Read 6 tweets
Aug 16, 2021
Perpetual reminder: cases going up when there are NPIs (e.g. stay at home orders) in place generally does not tell us much about the impact of the NPIs.

Lots of folks out there making claims based on reading tea leaves from this kind of data and shallow analysis; be careful.
What we want to know is what would have happened if the NPIs were not there. That's EXTREMELY tricky.

How tricky? Well, we would usually expect case/hospitalizations/deaths to have an upward trajectory *even if when the NPIs are extremely effective at preventing those outcomes.*
The interplay of timing, infectious disease dynamics, social changes, data, etc. make it really really difficult to isolate what the NPIs are doing alongside the myriad of other stuff that is happening.

More here: pubmed.ncbi.nlm.nih.gov/34180960/
Read 4 tweets
Jul 22, 2021
The resistance to teaching regression discontinuity as a standard method in epi continues to be baffling.
I can't think of a field for which RDD is a more obviously good fit than epi/medicine.

It's honestly a MUCH better fit for epi and medicine than econ, since healthcare and medicine are just absolutely crawling with arbitrary threshold-based decision metrics.
(psssssst to epi departments: if you want this capability natively for your students and postdocs - and you absolutely do - you should probably hire people with cross-disciplanary training to support it)
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(