Today, I would like to share some resources on causal inference. - a thread ⬇️
I came to this topic, while working with clinicians who use IPW and matching on a daily basis (they are not familiar with double robust approaches). I don’t know for you, but I am so admirative of them as they combine their work with patients with research to advance knowledge
@Susan_Athey@heinzedeml I can also share the google doc, getting started with causal inference, with many pointers tinyurl.com/du69bt3m done mainly by @imkemay, for our causal inference and missing data group (CIMD) at @Inria. I share this to anyone who tries to navigate in this productive field.
@Susan_Athey@heinzedeml@imkemay@Inria Note that this is a field where women are well represented and where there are many inspiring figures! see for instance the speakers at the Online Causal Inference Seminar.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Now, I would like to mention an R package, FactoMineR that I use on a daily basis to explore and visualize heterogeneous data: quantitative, categorical, with group structures, (multiple) contingency tables.
At its core, SVD! (I am also an SVD fan, @daniela_witten ;-).
@daniela_witten Note it was also the case for the famous @SherlockpHolmes, a role model for reproducibility, who I admire both from a scientific and personal point of view.
Hello!
So today, I will share a few thoughts and advice I usually give to my PhD students. I hope this might be helpful for a wider audience, even if it is obvious and already stated by others. Anyway, as a teacher, we know repetition is important ;) - a thread ⬇️
1)Ask questions
Ask questions
Ask questions
….
Ask questions!
I mean that: don't hesitate to ask questions in seminars (in France in particular, we don't dare enough). Be curious, don’t be shy.
2) If you are tired and can't work, just don’t. Take a break, take a walk if you can. I've never regretted it, although I've often regretted staying in front my computer all day because I couldn't get anything done
Missing Data, a thread ⬇️
Missing values are everywhere! We have listed more than 150 R packages in cran.r-project.org/web/views/Miss…
So let us give few pointers:
The method of handling missing data depends on the purpose of the analysis: estimation, completion, prediction, etc.
1) For inference with missing values, estimating as well as possible a parameter and giving a confidence interval, consider likelihood approaches (using EM algorithms) or multiple imputation
2) Single Imputation/Matrix completion aims at completing (predicting the missing entries) a dataset as best as possible. Multiple imputation aims at estimating parameters and their variability, taking into account the uncertainty due to missing values
I have been working for >10 years on missing data.
My passion for data science mainly comes from its transversality: as a statistician, we can interact with so many scientific fields!
With missing data the same is true but within statistics, as it can pop up in all its branches.
When I first meet a scientist for a new project, I always start the conversation by asking “Show me the data!” to understand the underlying challenges.
So far, I have never been shown a complete dataset... (of course there might be some bias!).
With @imkemay, Aude Sportisse, @nj_tierney and @Natty_V2, we created the Rmistatic platform rmisstastic.netlify.app, to organize all the resources (courses, tutorials, articles, software, etc.) and implement analysis pipelines with missing data in R/Python.
I’m going to begin today with a bold claim: Being an applied statistician is a lot like being an ethnographer.
I say this both based upon years of experience working in collaborative projects and consulting and based on my experience studying ethnography. (Recall: before my PhD in statistics, I started and quit a PhD in sociology).
Very often a question asked is not the ‘real’ question at hand. Typically, the person asking has a sense of the problem, but may not know exactly how to ask the question.
Yesterday I tweeted about nested data, with multi-level models (MLM) versus OL + cluster-robust variance estimation (CRVE). This made me think about another confusion that arise, between what are called fixed versus random effects.
Let’s begin with a simple relationship between a covariate X and Y in nested data, e.g. students i nested in school j. We are interested in understanding the relationship between X and Y at the student level.
Approach 1: Assume the schools are fixed, but that students are a random sample within these schools. Assume the relationship between X and Y is the same in all schools. This often amounts to including a dummy variable for each school in the model. Here I use OLS to estimate β_1.