This is my favorite teaching example for showing the importance of #CausalInference: @Google conducts an annual pay equity analysis in which they use fairly advanced statistical techniques. In 2019 they found that they were actually underpaying MEN?! npr.org/2019/03/05/700… 1/ Image
What do they do specifically? They collect a lot of data (as Google does) and then run OLS regressions of annual compensation on demographic variables (gender, race) and other explanatory variables such as tenure, location, and performance. services.google.com/fh/files/blogs… 2/ Image
If they find statistically meaningful differences, @Google is actually committed to make upward adjustments for the disadvantaged groups. In this case it was male, level-4 software engineers who got a raise. 3/ Image
But here comes the problem: Google runs these regressions separately for specific groups of employees, based on their job level and function. They do this to avoid comparing 🍎 with 🍐. And why wouldn't you? 4/
Well, we know that adjusting for a third variable can sometimes do funny things to the sign of a statistical relationship. This is the famous Simpson's paradox, named after the British statistician Edward Simpson (another white dude). everydayconcepts.io/simpsons-parad… 5/ Image
It could very well be that women are overall paid less at an organization like Google, but if you adjust for a third variable like job level or function, the sign flips and suddenly you get the exact opposite direction for the relationship. 6/
To find the right answer, we cannot simply look at the data, because there is nothing in it that can tell us how to properly analyze it — no matter how large it is and how finely we can slice it. We need to make causal assumptions! 7/
Variables such as job level and function are likely affected by gender, because we know from prior literature that there are, e.g., child penalties for women and gender-specific occupation choices. This turns them into so-called "post-treatment variables". 8/ Image
At the same time, there might be many determinants of an employee's job level and compensation that even @Google can't observe in their vast data. One prime candidate for such unobservables are personal job-related skills, which we often only have rough proxies for. 9/ Image
But if we now want to estimate the effect of gender on compensation, job level becomes a collider. If we control for it, by running separate regressions for each job level, we create a bias that stems from the fact that employees with higher skills receive higher salaries. 10/ Image
The intuition here is that women have more obstacles to overcome to make it to higher-level positions. Those women that make it nonetheless are often a specifically selected group with likely higher skills than average. This higher skill level pushes their annual pay. 11/
So especially in groups with higher seniority you will find women that consistently over performed throughout their career to make it this far. It is therefore not surprising that they might also receive, e.g., higher bonuses than their male peers. 12/
More on these causal inference challengenes and the dangers of estimating the gender wage gap with sophisticated ML methods without a proper theory behind it, can be found in this paper: arxiv.org/abs/2108.11294 Thanks for reading! 13/13

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Paul Hünermund 🇺🇦🇪🇺

Paul Hünermund 🇺🇦🇪🇺 Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(