When I see a paper on explainability, first question I
ask is: "What does it explain?", the data-fitting strategy of
the fitter? or real-life events such as death or survival.
I believe this paper arxiv.org/pdf/2010.10596…
is mostly about the former, as can be seen from the 1/
equations and from the absence of any world-model.
While it is sometimes useful to explain the data-fitting system (eg. for debugging), it is also important to distinguish this kind of counterfactual explanations
from the kind generated in the causal inference literature.
2/3
Beware, a model-blind system might conclude that
the rooster crow explains the sunrise. It might also explain that your loan was denied because you are a male, and would also have been denied if you were a female. I wonder how ML folks would debug this system.
Beware, a model-blind system might conclude that
the rooster crow explains the sunrise.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/4 Comments on your Front-Door paper:
* The expression "a single, strictly exogenous mediator
variable" is problematic: (1) Causality p. 82 defines
FDC as "A set of variables", not "a single variable". (2)
"exogenous mediator" is an oxymoron. I originally
called it (1973):
2/ "Mediating Instrumental Variables" ucla.in/2pJzGNK, best described as an "exogenously-disturbed mediator".
* "The first application of FDC" sounds too pessimistic. Situations involving exogenously-disturbed mediators are at least as plausible as "exclusion-restricted
3/ exogenous variables" (traditional IV's) which were introduced 70 yrs earlier, when DAGs were not
around to invite scrutiny. Imbens comments
reflect that absence ucla.in/36EoNzO
* Why introduce FDC in the context of linear regression
where ATE is identifiable by
1/ I'm glad, Sean, that our brief exchange has resulted in your great clarification of the issues, from which I have learned a lot. Two thoughts come immediately to mind: (1) It is a blessing that we can enjoy a division of labor between CI and statistics, the former generates
2/ causal estimands, the latter estimate them. Note though that the former is not totally oblivious to the type of data available. Different types of data will result in different estimands. eg.,experimental vs. observational, corrupted by missingness or by proxies or by
3/ differential selection etc. (2) I don't buy the mystification of "collecting adequate data". I am in the business of automating a scientist, so, if there is human judgement involved in the data collection task, we do not stop here and surrender it to humans. We see it as an
1/4 In view of the dominant role that re-weighing plays in
extrapolating effects across populations, and the many
Twitter requests for a concise graphical criterion
that gives re-weighing its legitimacy, I am retweeting
the criterion (called "S-admissibility"), in next 4 tweets.
2/ It works on a selection diagram in which S nodes represent disparities between the target (*) population and study population (experimental). Z is a set of measurements. To test if Z is S-admissible (1) Remove all arrows pointing to X (2) Check if {X, Z} d-separates S from Y
3/4 If Z passes this test, then the reweighing formula is valid:
P*(y|do(x)) = SUM_z P(y|do(x),z)P*(z)
In words: Effect at target equals the Z-specific effects at study, averaged over Z, using the target distribution P*(z) as weight.
Warning, this is merely a sufficient test.
1/3 This is an excellent paper, that every regression analyst should read. Primarily, to appreciate how problems that have lingered in decades of confusion can be untangled today using CI tools. What I learned from it was that the "suppressor surprise" is surprising even when
2/3 cast in a purely predictive context: "How can adding a second lousy predictor make the first a better predictor?" Evidently, what people expect from predictors clashes with the logic of regression slopes. The explanation I offered here ucla.in/2N8mBMg (Section 3)
3/3 shows how the phenomenon comes about, but the reason for the clash is still puzzling: What exactly do people expect from predictors, and why? #Bookofwhy
1/ Hating to insult or disappoint any of my followers, I was seriously considering your suggestion to create a new twitter handle. But one word you said made me change my mind: "uninformed". I have been reading, writing and researching the Middle East for the past 83 years. I was
2/ there when Azam Pasha declared (Oct. 11, 1947) "a war of extermination and momentous massacre" on a nation of refugees of which I was a son. And I was here at UCLA (2014) when BDS's Omar Barghouti re-denied my people right to self determination: ucla.in/2SxjgWH. A new
3/ twitter handle will give people of your persuasion the illusion that it is impossible for an "informed" person to disagree with their bubble of self righteousness and that "informed" people must be blind to the genocidal aims of BDS and its spokeswomen Rashida and Ilhan.
1/ Continuing our exploration of "Reduced Form Equations" (RFE) and what they mean to economists, let me address some hard questions that CI analysts frequently ask. Q1: Isn't a RFE just a regression equation? A1. Absolutely Not! A RFE carries causal information, a regression
2/ equation does not. Q2: Isn't a RFE just a structural equation? A1. No! Although a RFE carries causal information (much like a structural equation) the RFE may not appear as such in the structural model; it is derived from many such equations though functional composition.
3/ (The output-instrument in the IV setting is a typical example). Q3: One may derive many equations from a structural model; what makes a RFE so special to deserve its own name. A3: It is exceptional because it comes with a license of identification by OLS. This is not usually