👋 @LucyStats here! It's been a very exciting week for folks in Causal Inference with the Nobel Prize announcements, I thought it'd be neat to dive back in history to hear about a previous Nobel winner, Ronald Ross
1/
This topic is fun because it spans a whole myriad of my interests!
✔️We've got stats!
✔️We've got poetry!
✔️We've got infectious disease epidemiology!
Ronald Ross won the Nobel Prize for Physiology or Medicine in 1902 "for his work on malaria, by which he has shown how it enters the organism and thereby has laid the foundation for successful research on this disease and methods of combating it."
This was based on a discovery about malaria and mosquitoes. Somewhat famously, after making this discovery this polymath wrote a poem!
I love hearing about quantitative folks living out loud with their humanities talents!
4/
Then Ross turned to trying to predict the magnitude of malaria outbreaks. He came up with a mechanistic model that predicted the number of new infections per month based current epi parameters 👇
This led Ross to come up with the "critical mosquito density" to show when the malaria epidemic would die out. Previously, folks thought that only *complete* elimination of mosquitos would stop the spread.
Turns out his theory seemed to work! If they could get the mosquito density down to fewer than 40 mosquitoes per person in the population, malaria cases decreased
Something I ❤️ about this example is the combination of mechanistic modeling and statistics! Ross hypothesized a mechanistic relationship (via the original equations) and used these to inform "policy". He then used statistics to collect data & confirm whether it worked
7/
This led to using differential equations to represent disease dynamics.
Does this sound familiar?! The concept of the basic reproductive number is equivalent to Ross’s "critical mosquito density"!
8/
If you'd like to know more about mechanistic models (and how to quantify uncertainty in them!), check out this paper @khgrantz, @EpiEllie, & I wrote:
So there you have it! A bit of history about a previous Nobel winner. You could maybe even consider him a *biostatistician* 🏆, his biographers did (see next tweet!)
10/
"If Ross had been born 100 years later he could have become an eminent biostatistician. His thinking in this area was well ahead of his time and perhaps did not achieve the recognition it deserved until much later." -Nye and Gibson (1997, p. 279)
[pubmed.ncbi.nlm.nih.gov/12762435/]
11/
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Today, I would like to share some resources on causal inference. - a thread ⬇️
I came to this topic, while working with clinicians who use IPW and matching on a daily basis (they are not familiar with double robust approaches). I don’t know for you, but I am so admirative of them as they combine their work with patients with research to advance knowledge
Now, I would like to mention an R package, FactoMineR that I use on a daily basis to explore and visualize heterogeneous data: quantitative, categorical, with group structures, (multiple) contingency tables.
At its core, SVD! (I am also an SVD fan, @daniela_witten ;-).
@daniela_witten Note it was also the case for the famous @SherlockpHolmes, a role model for reproducibility, who I admire both from a scientific and personal point of view.
Hello!
So today, I will share a few thoughts and advice I usually give to my PhD students. I hope this might be helpful for a wider audience, even if it is obvious and already stated by others. Anyway, as a teacher, we know repetition is important ;) - a thread ⬇️
1)Ask questions
Ask questions
Ask questions
….
Ask questions!
I mean that: don't hesitate to ask questions in seminars (in France in particular, we don't dare enough). Be curious, don’t be shy.
2) If you are tired and can't work, just don’t. Take a break, take a walk if you can. I've never regretted it, although I've often regretted staying in front my computer all day because I couldn't get anything done
Missing Data, a thread ⬇️
Missing values are everywhere! We have listed more than 150 R packages in cran.r-project.org/web/views/Miss…
So let us give few pointers:
The method of handling missing data depends on the purpose of the analysis: estimation, completion, prediction, etc.
1) For inference with missing values, estimating as well as possible a parameter and giving a confidence interval, consider likelihood approaches (using EM algorithms) or multiple imputation
2) Single Imputation/Matrix completion aims at completing (predicting the missing entries) a dataset as best as possible. Multiple imputation aims at estimating parameters and their variability, taking into account the uncertainty due to missing values
I have been working for >10 years on missing data.
My passion for data science mainly comes from its transversality: as a statistician, we can interact with so many scientific fields!
With missing data the same is true but within statistics, as it can pop up in all its branches.
When I first meet a scientist for a new project, I always start the conversation by asking “Show me the data!” to understand the underlying challenges.
So far, I have never been shown a complete dataset... (of course there might be some bias!).
With @imkemay, Aude Sportisse, @nj_tierney and @Natty_V2, we created the Rmistatic platform rmisstastic.netlify.app, to organize all the resources (courses, tutorials, articles, software, etc.) and implement analysis pipelines with missing data in R/Python.