Tweet

@Susan_Athey

@Susan_Athey

@Susan_Athey

@Susan_Athey

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WomenInStat

Women in Statistics and Data Science

@WomenInStat

29 Sep

@daniela_witten

Now, I would like to mention an R package, FactoMineR that I use on a daily basis to explore and visualize heterogeneous data: quantitative, categorical, with group structures, (multiple) contingency tables.

At its core, SVD! (I am also an SVD fan, @daniela_witten ;-).

@daniela_witten

@daniela_witten FactoMineR is indebted to the “French School of Data Analysis” (see arxiv.org/abs/0805.2879 or juliejosse.com/data-analysis/ for historical background), a field of statistics I was trained in.

@daniela_witten

@daniela_witten Note it was also the case for the famous @SherlockpHolmes, a role model for reproducibility, who I admire both from a scientific and personal point of view.

Read 5 tweets

Women in Statistics and Data Science

@WomenInStat

29 Sep

Hello!
So today, I will share a few thoughts and advice I usually give to my PhD students. I hope this might be helpful for a wider audience, even if it is obvious and already stated by others. Anyway, as a teacher, we know repetition is important ;) - a thread ⬇️

1)Ask questions
Ask questions
Ask questions
….
Ask questions!
I mean that: don't hesitate to ask questions in seminars (in France in particular, we don't dare enough). Be curious, don’t be shy.

2) If you are tired and can't work, just don’t. Take a break, take a walk if you can. I've never regretted it, although I've often regretted staying in front my computer all day because I couldn't get anything done

Read 12 tweets

Women in Statistics and Data Science

@WomenInStat

28 Sep

Missing Data, a thread ⬇️
Missing values are everywhere! We have listed more than 150 R packages in cran.r-project.org/web/views/Miss…
So let us give few pointers:
The method of handling missing data depends on the purpose of the analysis: estimation, completion, prediction, etc.

1) For inference with missing values, estimating as well as possible a parameter and giving a confidence interval, consider likelihood approaches (using EM algorithms) or multiple imputation

2) Single Imputation/Matrix completion aims at completing (predicting the missing entries) a dataset as best as possible. Multiple imputation aims at estimating parameters and their variability, taking into account the uncertainty due to missing values

Read 18 tweets

Women in Statistics and Data Science

@WomenInStat

27 Sep

I have been working for >10 years on missing data.
My passion for data science mainly comes from its transversality: as a statistician, we can interact with so many scientific fields!
With missing data the same is true but within statistics, as it can pop up in all its branches.

When I first meet a scientist for a new project, I always start the conversation by asking “Show me the data!” to understand the underlying challenges.
So far, I have never been shown a complete dataset... (of course there might be some bias!).

@imkemay

With @imkemay, Aude Sportisse, @nj_tierney and @Natty_V2, we created the Rmistatic platform rmisstastic.netlify.app, to organize all the resources (courses, tutorials, articles, software, etc.) and implement analysis pipelines with missing data in R/Python.

Read 4 tweets

Women in Statistics and Data Science

@WomenInStat

30 Apr

I’m going to begin today with a bold claim: Being an applied statistician is a lot like being an ethnographer.

I say this both based upon years of experience working in collaborative projects and consulting and based on my experience studying ethnography. (Recall: before my PhD in statistics, I started and quit a PhD in sociology).

Very often a question asked is not the ‘real’ question at hand. Typically, the person asking has a sense of the problem, but may not know exactly how to ask the question.

Read 12 tweets

Women in Statistics and Data Science

@WomenInStat

28 Apr

Yesterday I tweeted about nested data, with multi-level models (MLM) versus OL + cluster-robust variance estimation (CRVE). This made me think about another confusion that arise, between what are called fixed versus random effects.

Let’s begin with a simple relationship between a covariate X and Y in nested data, e.g. students i nested in school j. We are interested in understanding the relationship between X and Y at the student level.

Approach 1: Assume the schools are fixed, but that students are a random sample within these schools. Assume the relationship between X and Y is the same in all schools. This often amounts to including a dummy variable for each school in the model. Here I use OLS to estimate β_1.

Read 8 tweets

Share this page!

Women in Statistics and Data Science

Try unrolling a thread yourself!

More from @WomenInStat

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Did Thread Reader help you today?

Like this author's thread?