Tweet

Women in Statistics and Data Science

15 Oct, 16 tweets, 7 min read

@simongrund89

One of my favorite things to talk about in causal inference is the target *estimand*

🤓 Let's talk about a few different estimands and how to estimate them!

(image cred: @simongrund89)

A lot of my research is in the observational study space. This basically mean that participants in the study were not randomly assigned treatments or exposures, but rather we just observe how a certain exposure affects an outcome

♥️ For example: Is a diabetes drug associated with heart disease?

Instead of randomly giving some patients drug A and some drug B, we evaluated the electronic health records of patients who were already taking the drugs & assessed their health after.

lucymcgowan.com/writing/2017_c…
3/

There are some issues with this analysis - since we didn’t randomly assign patients to drug A and drug B, it is possible that doctors selected one drug over the other for certain reasons that reflect patient characteristics
4/

Perhaps healthier patients are often prescribed drug A – this could make it look like those who take drug B are more likely to have heart disease simply based on their pre-treatment characteristics

✨Propensity scores can help to adjust for these pre-treatment characteristics
5/

✨ A propensity score is the probability of being assigned to a certain treatment, conditional on pre-treatment (or baseline) characteristics

Here is a mirrored histogram of propensity scores for treatment (top) and control (bottom) groups

6/

Let's spend a second staring at the graph. Two things jump out to me:

☝️ More mass on the *right* in the treatment group (top) means that more people in that group had a higher probability of receiving treatment (makes sense!)
✌️ More people received the treatment vs control

7/

Ultimately, to make an apples to apples comparison, we want to make these two groups comparable. There are lots of ways to do this! That is where the *estimand* of interest comes in.

8/

We could estimate the *average treatment effect*. Here the target population is the *whole* population. To make these populations comparable I could *upweight* everyone based on their propensity score.

This graph overlays the pseudo-population created by doing this

Here, the light green distribution (the up-weighted treatment group) is pretty comparable to the blue distribution (the up-weighted control group). The weights are:

💊 treatment: 1 / propensity score
💨 control: 1 / (1 - propensity score)

10/

Notice those weights can range from 1 to infinity! Yikes! If someone in the treatment group has a really small propensity score (or control has a really large one) they could count a whole lot in our analysis. This can lead to finite sample bias // variance issues (boo!)

11/

Another estimand is the average treatment effect among the *treated*

💊 everyone in the treatment group gets a weight of 1
💨 control: propensity score / (1 - propensity score)

Notice the blue and green distribution still match! But they look different from the ATE graphs

12/

Because this particular example has more treated folks than control, we ended up having to upweight a bunch of the control arm to match -- again this can be unstable

13/

One of my *favorite* estimands is the average treatment effect among the overlap population. The weight is quite simple:

💊 treated: (1 - propensity score)
💨 control: propensity score

These weights are bounded by 0 and 1, so they have nice variance properties!

@noah_greifer

Check out this amazing preprint by @noah_greifer & @Lizstuartdc on how to choose an estimand based on your question (and how that maps to particular weighting / matching choices)

arxiv.org/pdf/2106.10577…

https://twitter.com/WomenInStat/status/1449068312873287682?s=20

For an #rstats code walk through on the plots I used for this thread, check out this blog post:

✍️ livefreeordichotomize.com/2019/01/17/und…

https://twitter.com/WomenInStat/status/1449068312873287682?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WomenInStat

Women in Statistics and Data Science

@WomenInStat

13 Oct

@thomas_mock

Is our sample of #rstats packages tweeted about during the past week or so representative? Let's find out!

@thomas_mock has an amazing dataset of 737,466 tweets with the #rstats hashtag (with the earliest dating back over 10 years ago!)

Let's run the same code on this data!

@thomas_mock

@thomas_mock @eddelbuettel pointed out that my first lines to get all #rstats packages on CRAN could be simplified by just running

r_pkgs <- rownames(available.packages())

(thanks!)

https://twitter.com/WomenInStat/status/1448351924273754121?s=20

Alright, let's run that same code from the previous thread to see what happens!

https://twitter.com/WomenInStat/status/1448351924273754121?s=20

Read 6 tweets

Women in Statistics and Data Science

@WomenInStat

13 Oct

I'm curious about the most tweeted about #rstats packages in the past ~week, let's explore this using R! 🧵

I'm going to use these packages:
*️⃣ tidyverse
*️⃣ rtweet
*️⃣ rvest
*️⃣ tidytext

you can install them with:
install.packages(c("tidyverse", "rtweet", "rvest", "tidytext"))

First, let's get a vector of all #rstats packages on CRAN!

library(rvest)

r_pkgs <- read_html('cloud.r-project.org/web/packages/a…') %>%
html_nodes('td:nth-child(1)') %>%
html_text()

Then let's pull in all tweets in the past week or so that use the #rstats hashtag

library(rtweet)

df <- search_tweets(q = "#rstats", n = 7000, include_rts = FALSE)

Read 11 tweets

Women in Statistics and Data Science

@WomenInStat

12 Oct

@LucyStats

👋 @LucyStats here! It's been a very exciting week for folks in Causal Inference with the Nobel Prize announcements, I thought it'd be neat to dive back in history to hear about a previous Nobel winner, Ronald Ross

1/

This topic is fun because it spans a whole myriad of my interests!
✔️We've got stats!
✔️We've got poetry!
✔️We've got infectious disease epidemiology!

Much of what I'll chat about today I learned from this article: pubmed.ncbi.nlm.nih.gov/12762435/

2/

Ronald Ross won the Nobel Prize for Physiology or Medicine in 1902 "for his work on malaria, by which he has shown how it enters the organism and thereby has laid the foundation for successful research on this disease and methods of combating it."

nobelprize.org/prizes/medicin…
3/

Read 12 tweets

Women in Statistics and Data Science

@WomenInStat

30 Sep

Today, I would like to share some resources on causal inference. - a thread ⬇️

I came to this topic, while working with clinicians who use IPW and matching on a daily basis (they are not familiar with double robust approaches). I don’t know for you, but I am so admirative of them as they combine their work with patients with research to advance knowledge

@Susan_Athey

I recommend @Susan_Athey's videos (such as aeaweb.org/webcasts/2018/…), an example of pedagogy - they also make available the analysis notebooks, gsbdbi.github.io/ml_tutorial/in…. For a longer and more technical course see, Stefan’s Wager web.stanford.edu/~swager/stats3…

Read 6 tweets

Women in Statistics and Data Science

@WomenInStat

29 Sep

@daniela_witten

Now, I would like to mention an R package, FactoMineR that I use on a daily basis to explore and visualize heterogeneous data: quantitative, categorical, with group structures, (multiple) contingency tables.

At its core, SVD! (I am also an SVD fan, @daniela_witten ;-).

@daniela_witten

@daniela_witten FactoMineR is indebted to the “French School of Data Analysis” (see arxiv.org/abs/0805.2879 or juliejosse.com/data-analysis/ for historical background), a field of statistics I was trained in.

@daniela_witten

@daniela_witten Note it was also the case for the famous @SherlockpHolmes, a role model for reproducibility, who I admire both from a scientific and personal point of view.

Read 5 tweets

Women in Statistics and Data Science

@WomenInStat

29 Sep

Hello!
So today, I will share a few thoughts and advice I usually give to my PhD students. I hope this might be helpful for a wider audience, even if it is obvious and already stated by others. Anyway, as a teacher, we know repetition is important ;) - a thread ⬇️

1)Ask questions
Ask questions
Ask questions
….
Ask questions!
I mean that: don't hesitate to ask questions in seminars (in France in particular, we don't dare enough). Be curious, don’t be shy.

2) If you are tired and can't work, just don’t. Take a break, take a walk if you can. I've never regretted it, although I've often regretted staying in front my computer all day because I couldn't get anything done

Read 12 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Women in Statistics and Data Science

Try unrolling a thread yourself!

More from @WomenInStat

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Did Thread Reader help you today?

Like this author's thread?