Tweet

Women in Statistics and Data Science

27 Apr, 19 tweets, 4 min read

I work primarily with nested data. One example is in experiments, with students nested in schools. Another is meta-analysis, with effect sizes nested in studies. In this thread, I’ll focus on students nested in schools, but this applies more generally.

Question 1: Do you need to take nesting into account in your analysis? Our world is naturally nested – students in classrooms in teachers in schools in districts and so on. Does this mean we need to take all of these levels into account? No.

Nesting only needs to be accounted for if it is part of how our sample of data is generated – either how the data is selected (sampled) or the who gets an intervention being studied (assignment).

Question 2: What does nesting do? It creates dependence in our data. Treating them as independent can lead to an inflated Type I error.

Question 3: How should you take this nesting into account? This is a big one and where things can get complicated. I’ll talk about two approaches here.

Option 1. Use a multi-level model: Y_ij = [fixed part] + [random effects], where the [fixed part] includes a regression model and the [random part] includes both student and school level residuals. This model assumes that schools are a random sample from some population.

Extension 1. This model can accommodate interactions between covariates and random effects. For example, this model allows us to interact the treatment indicator with the school random effect. We can now the distribution of treatment effects.

Extension 2. By centering covariates, this model can allow you to separate the effects of covariates at the student level and the school level – and if they differ – the contextual effect. (See Raudenbush & Bryk for some nice examples).

Option 2. Use cluster-robust variance estimation (CRVE): Y_i = [fixed part] + [generic residual] and then empirically estimating the standard errors using a sandwich estimator, called something like Huber-White or Eickler or so on, depending upon your field.

Extensions. There aren’t really extensions. This model really allows you to focus on estimation of the effects of covariates at the student level on outcomes. The nesting is more noise than signal.

Question 4: Why one or the other? I hear this question a lot. The simplest answer is that your choice is determined by your field. Economics adjacent fields use CRVE, while sociology adjacent fields use multi-level models. So, norms are at play.

But I think the root of this difference is in regard to what is considered signal and noise in these fields. Economics – like psychology – historically has focused on the behavior of individuals.

Thus, the fact that the data is nested is not interesting – it is simply error or noise. Sociology, however, focuses on the role of context and groups – in which case, the nesting is interesting – it is signal.

Another answer though, may have to do with assumptions. The multi-level model makes several modeling assumptions – e.g., normality and that the correlation structure is correctly specified.

The CRVE approach doesn’t require these assumptions – it depends upon asymptotics, so if you get your structure wrong, no problem.

@jepusto

Question 5: Do I have to choose? Ok, no one really ever asks me this. But @jepusto and I have been working on this for some time and the short answer is: use both‼ That is, first use a multi-level model with an assumed structure, then use CRVE to get your standard errors.

@jepusto

@jepusto Result 1. For those used to using a multi-level model - you get all the benefits of the multi-level model PLUS the safety of your inferences being robust to misspecification of the error structure.

@jepusto

@jepusto Result 2. For those used to using CRVE – you get better small-sample performance. Here I mean the number of clusters and the types of covariates you’re studying. The multi-level model becomes a ‘working model’, and this working model improves performance, even when it’s wrong.

@jepusto

@jepusto If you’re interested in learning more, see these papers: doi.org/10.1080/073500…
and
osf.io/preprints/meta…

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @WomenInStat

Women in Statistics and Data Science

@WomenInStat

28 Apr

Yesterday I tweeted about nested data, with multi-level models (MLM) versus OL + cluster-robust variance estimation (CRVE). This made me think about another confusion that arise, between what are called fixed versus random effects.

Let’s begin with a simple relationship between a covariate X and Y in nested data, e.g. students i nested in school j. We are interested in understanding the relationship between X and Y at the student level.

Approach 1: Assume the schools are fixed, but that students are a random sample within these schools. Assume the relationship between X and Y is the same in all schools. This often amounts to including a dummy variable for each school in the model. Here I use OLS to estimate β_1.

Read 8 tweets

Women in Statistics and Data Science

@WomenInStat

26 Apr

Hello everyone – I’m so excited (and nervous!) to get to tweet with you all this week. I’ll start by telling you some general things about myself.

I’m an Associate Professor of Statistics at Northwestern University and a Faculty Fellow at the Institute for Policy Research. I also Co-Direct the Statistics for Evidence-Based Policy and Practice Center. For more info see here: bethtipton.com

I call my field “Social Statistics” and I much of what I study has to do with the role of statistics in the creation and use of evidence for decision making, particularly in the field of education research.

Read 13 tweets

Women in Statistics and Data Science

@WomenInStat

23 Apr

The #DataFeminism book also made me look inward and examine my own biases, which I am exceedingly grateful for.

Namely, it forced me to reckon with some of my fundamental operating assumptions as a statistician & data scientist.

Examples threaded below...

In chapter 3, the authors discuss the role of emotion in data visualization, specifically calling out giants in the field like Edward Tufte and Alberto Cairo (no snitch tagging, please) for what is presented as an anti-emotion stance.

On Tufte: "Any ink devoted to something other than the data themselves ... is a suspect and intruder to the graphic. Visual minimalism, according to this logic, appeals to reason first. ... Decorative elements ... are associated with messy feelings ... and emotional persuasion."

Read 12 tweets

Women in Statistics and Data Science

@WomenInStat

23 Apr

There are 7 core principles of #DataFeminism:

1. Examine Power
2. Challenge Power
3. Elevate emotion and embodiment
4. Rethink binaries and hierarchies
5. Embrace Pluralism
6. Consider Context
7. Make labor visible

Principle 1: Examine Power

"#DataFeminism begins by analyzing how power operates in the world."

data-feminism.mitpress.mit.edu/pub/vi8obxh7/r…

Principle 2: Challenge Power

"#DataFeminism commits to challenging unequal power structures and working toward justice."

data-feminism.mitpress.mit.edu/pub/ei7cogfn/r…

Read 8 tweets

Women in Statistics and Data Science

@WomenInStat

22 Apr

Good morning! Happy Thursday!

For #ThrowbackThursday I thought I'd highlight some of the amazing women who have been mentors (and friends) to me. Without support from an amazing community of women in mathematics & statistics I would not be where I am today! #WomenInSTEM

(These will be in chronological order)

@lpudwell

.@lpudwell : Lara Pudwell

Lara was my advisor during my summer REU experience at @ValpoU in 2011.

Without her mentorship, I don't think I would have ever considered graduate school!

Read 7 tweets

Women in Statistics and Data Science

@WomenInStat

24 Mar

Let's talk data visualizations today! Best practices, ideas, tools, resources or even some really neat visualizations - what are your recommendations?

I found this visualization of at-risk workers in COVID times very good at expressing key points, though I did not like the scroll feature too much!

nytimes.com/interactive/20…

Quite unlike the wealth disparity visualization where the scrolling was on point made all the difference:

mkorostoff.github.io/1-pixel-wealth/

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Women in Statistics and Data Science

Try unrolling a thread yourself!

More from @WomenInStat

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Women in Statistics and Data Science

Did Thread Reader help you today?

Like this author's thread?