@PausalZ@fediscience.org Profile picture
Professional epidemiologist / causal inference researcher / python programmer, amateur mycologist #Python #epitwitter https://t.co/cuewGX6vWD
Nov 17, 2021 17 tweets 5 min read
a 🧵 on M-Estimation and why I think its a valuable tool that epidemiologist should be using more often M-Estimation is a general approach of defining an estimator as the solution to estimating equations like the following. Importantly, obs are independent and \psi is a known function that doesn't depend on i or n
Jul 7, 2021 12 tweets 5 min read
Big fan of the "I forced a bot to [...] over 1000" memes. But most of those posts are fake (i.e. human-generated). That's why I decided to make a real one

So I forced a bot to read over 1000 PubMed abstracts in order to generate new abstracts ImageImage Basically, I pulled a random sample of 5000 abstracts from PubMed using the search terms: (causal inference) AND English[Language]

A random sample of the returned abstracts was used to train a recurrent neural network (RNN)
Sep 24, 2020 15 tweets 4 min read
Herd immunity is a far squishier concept then many seem to be describing in their "shielding" or "stratified herd immunity" plans. Here is the formula for herd immunity threshold for a SIR model Image where \beta is the effective contact rate, N is the number of individuals, and r is the inverse of the duration

The threshold says if are above that level the disease will disappear / we expect no outbreaks of disease. However, that threshold is neither sufficient nor necessary
Sep 20, 2020 13 tweets 6 min read
8: WHEN CAN I IGNORE THE METHODOLOGISTS
Section 8 discusses when standard analytic approaches are fine (aka time-varying confounding isn't as issue for us). Keeping with the occupation theme, it is presented in the context of when employment history can be ignored Image First we go through the simpler case of point-exposures (ie only treatment assignment at baseline matters). Note that while we get something similar to the modern definition, I don't think the differentiation from colliders is quite there yet (in the language) ImageImage
Sep 19, 2020 6 tweets 4 min read
7: MORE ASSUMPTIONS
Section 7 adds some additional a priori assumptions that can allow us to estimate in the context where we don't have all necessary confounders.
We have the beautifully named: A-complete Stage 0 PL-sufficient reduced graph of R CISTG A Image We start with some rules for reducing graph G_A to a counterpart G_B. Honestly the language in this section isn't clear to me despite reading it several times... ImageImage
Sep 15, 2020 9 tweets 5 min read
6: NONPARAM TESTS
Section 6 goes through the sharp null hypothesis (that no effect of exposure on any individual). Note that this is weaker than the null of no _average_ effect in the population Image Another way of thinking about this is if there is no individual causal effect (ICE) then there must be no average causal effect (ACE). The reverse (no ACE then no ICE) is not guaranteed
Sep 13, 2020 8 tweets 3 min read
5: ESTIMATION
After a little hiatus, back to discussing Robins 1986 (with a new keyboard)! Robins starts by reminding us (me) that we are assuming the super-population model for inference Image If we had a infinite n in our study, we could use NPMLE. However, time-varying exposures have a particular large number of possible intervention plans. We probably don't have anywhere near enough obs to consider all the possible plans Image
Aug 22, 2020 7 tweets 3 min read
4: FORMAL CAUSAL INFERENCE (ATTIRE REQUESTED)
Math on twitter dot com? Should be fine /s
Shorter thread though Image In Section 4.C we get a quirk of the deterministic results. Essentially within the deterministic system that nature created, the exposure pattern between t_0 and the end of the study has been ‘set’, no matter when outcomes occur. This is used to extend to competing risks Image
Aug 15, 2020 10 tweets 4 min read
Quick thread on tree graphs for causal inference. Let L be a binary confounder, A be a binary treatment, and Y be the binary outcome at end of follow-up. Subscripts indicate time. Our data look like this Image It is a lot to look at, so I am going to simply the graph only indicate the columns. But remember that branch splits indicate the different values Image
Aug 14, 2020 30 tweets 10 min read
3: GRAPHS FOR CAUSAL INFERENCE
This section tells us the theoretical framework for when causality can be inferred from obs. data (under the FFR-CISTG model) We start with the process ImageImage First, we draw a tree to represent the data we get to see (MPISTG)
Next, we define the causal parameters through a tree graph (MCISTG)
Next, we determine the causal parameters of interest
Lastly, we use an algorithm to estimate causal parameters from a final tree graph
Aug 7, 2020 44 tweets 14 min read
Because I have been meaning to read through it fully and this is a better way to keep myself accountable, a thread as I (we) read through Robins 1986. (Also writing it out helps me think better)

sciencedirect.com/science/articl… I will probably do a section every (or every few) day(s)

Some of my nomenclature:
CI – Causal inference
RR – Risk Ratio
HWE – Healthy worker effect
(going to regret not putting more here)
Jun 25, 2020 12 tweets 2 min read
Since everyone has been becoming amateur infectious disease epidemiologists, let's talk about estimands in infectious disease epi studies, and why you are probably misinterpreting them. I will focus on RCTs, but the same concepts apply to observational Infectious diseases mean there is interference (person 1's treatment / exposure may effect person 2's infection with influenza). Causal inference with interference is difficult to interpret, even with randomized trials
Jun 14, 2020 9 tweets 3 min read
In light of current discussions, here is a breakdown of the @chapelhillgov 2019-2020 adopted budget Image Breaking down the General Government budget further Image
Apr 7, 2020 6 tweets 2 min read
Good a time as any to plug the recent review paper I worked on regarding social media and public health surveillance annualreviews.org/doi/abs/10.114… Th summary is that digital (internet) sources of data for surveillance are difficult. They assume exogenous shocks that result in increased traffic are due solely to disease incidence