This is my favorite teaching example for showing the importance of #CausalInference: @Google conducts an annual pay equity analysis in which they use fairly advanced statistical techniques. In 2019 they found that they were actually underpaying MEN?! npr.org/2019/03/05/700… 1/
What do they do specifically? They collect a lot of data (as Google does) and then run OLS regressions of annual compensation on demographic variables (gender, race) and other explanatory variables such as tenure, location, and performance. services.google.com/fh/files/blogs… 2/
If they find statistically meaningful differences, @Google is actually committed to make upward adjustments for the disadvantaged groups. In this case it was male, level-4 software engineers who got a raise. 3/
But here comes the problem: Google runs these regressions separately for specific groups of employees, based on their job level and function. They do this to avoid comparing 🍎 with 🍐. And why wouldn't you? 4/
Well, we know that adjusting for a third variable can sometimes do funny things to the sign of a statistical relationship. This is the famous Simpson's paradox, named after the British statistician Edward Simpson (another white dude). everydayconcepts.io/simpsons-parad… 5/
It could very well be that women are overall paid less at an organization like Google, but if you adjust for a third variable like job level or function, the sign flips and suddenly you get the exact opposite direction for the relationship. 6/
To find the right answer, we cannot simply look at the data, because there is nothing in it that can tell us how to properly analyze it — no matter how large it is and how finely we can slice it. We need to make causal assumptions! 7/
Variables such as job level and function are likely affected by gender, because we know from prior literature that there are, e.g., child penalties for women and gender-specific occupation choices. This turns them into so-called "post-treatment variables". 8/
At the same time, there might be many determinants of an employee's job level and compensation that even @Google can't observe in their vast data. One prime candidate for such unobservables are personal job-related skills, which we often only have rough proxies for. 9/
But if we now want to estimate the effect of gender on compensation, job level becomes a collider. If we control for it, by running separate regressions for each job level, we create a bias that stems from the fact that employees with higher skills receive higher salaries. 10/
The intuition here is that women have more obstacles to overcome to make it to higher-level positions. Those women that make it nonetheless are often a specifically selected group with likely higher skills than average. This higher skill level pushes their annual pay. 11/
So especially in groups with higher seniority you will find women that consistently over performed throughout their career to make it this far. It is therefore not surprising that they might also receive, e.g., higher bonuses than their male peers. 12/
More on these causal inference challengenes and the dangers of estimating the gender wage gap with sophisticated ML methods without a proper theory behind it, can be found in this paper: arxiv.org/abs/2108.11294 Thanks for reading! 13/13
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Since @Andrew___Baker called for a break day, let's go back to our favorite Twitter activity of 2020... discussing DAGs! I'm very happy that our paper "Causal Inference and Data Fusion in Econometrics" is finally forthcoming in the Econometrics Journal. academic.oup.com/ectj/advance-a… 1/
In this paper, we review the advances that have been made in the causal AI literature in recent years and discuss their value for empirical work in econometrics and adjacent disciplines (such as political science, sociology, and management). 2/
We're not the first to discuss DAGs from an econometric perspective. Several famous scholars, including Jim Heckman, Hal White, and Dan McFadden were engaging with the topic before. Perhaps most notably, Guido Imbens published his comparison of .. 3/ aeaweb.org/articles?id=10…
We just posted a substantially expanded version of our paper "On the Nuisance of Control Variables in Regression Analysis" (w/ @beyers_louw): arxiv.org/abs/2005.10314
Main message: Don't bother reporting the coefficients of controls, because they are likely to be biased anyway.
Citations for the arXiv version are coming in nicely, so people seem to find the paper useful. The succinct format as a research note seems to be appreciated too. But some of the more intricate aspects of the argument might have been a bit glossed over in the previous version.
In the new version, we have expanded the theory part. We now show more DAGs and simulations that demonstrate under which conditions estimated effect sizes of control variables can be interpreted causally.
Jetzt kann man natürlich der Meinung sein, dass es keine gute Sache ist, wenn Professor:innen so viel nebenbei machen. Für den Wissenstransfer muss das aber gar nicht so schlecht sein. 🧵 1/9
Eine interessante Fallstudie dazu liefert die Abschaffung des sogenannten "Professorenprivilegs" in 2002. Mein ehemaliger Advisor an der KU Leuven, Dirk Czarnitzki, hat dazu ein interessantes Papier. papers.ssrn.com/sol3/papers.cf… 2/9
Das Professorenprivileg erlaubte es Lehrstuhlinhabern, anders als anderen Angestellten nach dem deutschen Erfindergesetz, über die Vermarktung von Erfindungen die während der Ausführung der beruflichen Tätigkeit gemacht werden, frei zu entscheiden. 3/9
Happy to see our paper "The Choice of Control Variables: How Causal Graphs Can Inform the Decision" (w/ @beyers_louw & M. Rönkkö) included in the best paper proceedings of the 82nd Annual Meeting of the Academy of Management. #AOM2022@AOMConnectjournals.aom.org/doi/epdf/10.54… 1/5
We present practical recommendations on how to choose suitable control variables for regression analyses – a topic which seems to cause quite some confusion in the management literature (if you ever read the phrase "if in doubt leave out" you know what I'm talking about). 2/5
The best paper proceedings include abridged versions (max. 6 pages) of selected papers that will be presented at #AOM2022. Our session (#1088) is scheduled for Aug 8 2022 from 8:00AM to 9:30AM local Seattle time. You are all very welcome to join! 3/5