1/ Summarizing our discussion of "demand" via "ceteris paribus" (CP), we've seen that, once formalized, CP amounts to comparing Y under two settings of X, say X=x and X=x', while leaving other variables in the structural equation for Y unchanged. The beauty of formal definitions
2/ is that they hold for all models and are independent on the meanings of X, Y,Z, etc,
or the procedure by which we estimate things. Leveraging these beauties, we come to realize that the resultant CP definition of "demand" is none other but the counterfactual definition of
3/ Causal Effect, namely {Y(x, Z),Y(x', Z)}, where Z is the set of other variables in the eq. of Y, both observed and unobserved. Thus, the analysis of "demand" can benefit directly from the literature on causal effects, presenting no peculiarities that demand special treatments.
4/ Of course, the cyclicity of the equations prevent us from leveraging d-separation and do-calculus. But the logic of counterfactuals can still be summoned to advantage, as is done here ucla.in/2NnfGPQ#page=59. From this point on, as they say in economics: Ceteris Paribus.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
1/ This just in. A new successful paradigm for building AI systems has emerged, called "Foundation Model". According to its inventors crfm-stanford.github.io, it works as follows: "Train one model on a huge amount of data and adapt it to many applications." Not only is it
2/ seen "as the beginnings of a sweeping paradigm shift in AI", but a whole Center has been erected in its honor, dozens of prominent researchers, post-docs and PhD students has joined its staff, and an interdisciplinary symposium has been announced. We, foot-soldiers in the
3/ trenches of AI research are asking, of course: "What is it?" or "What is the scientific principle by which 'Foundation models' can circumvent the theoretical limitations of data-centric methods as we know them, especially those that hinder generalization across environments?"
1/ Can "traditional statistics" handle "effect sizes?" If we include Neyman-Rubin in "traditional statistics" and interpret "Can" to mean "Can, in principle", the answer is Yes. However, if we take "traditional statistics" to be represented by: Pearson, Fisher, Chochran, Tuckey,.
2/ Breiman, Friedman,...+deceased presidents of ASA, RSS...+authors of stat texts+..., and if we interpret "Can" to mean "Capable of handling a simple problem in 2 weeks time," I would bet 100:1 on "NO!". Reason: They lacked a language to articulate the assumptions needed for
3/ estimating effect sizes, and it takes about 2 weeks to learn such a language, be it "potential outcomes", DAGs, SCM, or equivalent. Why haven't the giants bothered to learn any? My 1993-9 email is full of reasoned excuses, but the most common one has been: "It takes us out of
1/ Readers ask: What's the simplest problem in which a combination of experimental and observational studies can be shown to be better than each study alone?
Ans. Consider X--->Z----> Y
with unobserved confounder between X & Z.
Query Q: Find P(y|do(x))
We have 2 valid estimands:
2/
ES1 = P(y|do(x)) estimable from the experiment
ES2 =SUM_z P(z|do(x))P(y|z), the first term is estimable from the experiment, the second from the observational study.
ES2 is better than ES1 for 3 reasons: 1. P(y|z) can rest on a larger sample 2. ES2 is composite (see
3/ (see ucla.in/2ocoWqq for advantage of composite estimators) 3. ES2 need not measure Y in the experimental study.
Remark: The validity of ES2 follows from do-calculus. Adding any edge to the graph invalidates ES2 and leaves ES1 the only estimand.
1/ It might be useful to look carefully at the logic of proving an "impossibility theorem" for NN, and why it differs fundamentally from Minsky/Papert perceptrons. The way we show that a task is impossible in Rung-1 is to present two different data-generating models (DGM) that
2/ generate the SAME probability distribution (P) but assign two different answers to the research question Q. Thus, the limitation is not in a particular structure of the NN but in ANY method, however sophisticated, that gets its input from a distribution, lacking interventions.
3/ This is demonstrated so convincingly through Simpson's paradox ucla.in/2Jfl2VS,
even pictorially, w/o equations, in #Bookofwhy
Thus, it matters not if you call your method NN or "hierarchical nets" or "representation learning", passive observations => wrong answers
1/ This post re-enforces my support of the Myth of the Lone Genius, as tweeted here: 6.12.2021 (1/ ) Why I refuse to "cancel" or "decolonialize" Euclid Geometry, Archimedes Rule and Newton's Law, despite peers pressure? Because, as explained here ucla.in/2Qg0Rfs (p.8),
2/ putting a human face behind theorems and discoveries makes science "not a book of facts and recipes, but a struggle of the human mind to unveil the mysteries of nature." Personalizing science education makes each student "an active participant in, not a passive recipient of,
3/ those theorems and discoveries". Thus, regardless of how many collaborators Newton had, telling a student that by an effort of intellect, curiosity, and hard labor he/she can discover the one thing missing from the puzzle, we are turning that students into a
1/ Sharing an interesting observation from Frank Wiltzeck's book "Fundamentals."
In the 17th Century, while the entire scientific
world was pre-occupied with planetary motion and other
grand questions of philosophy, Galileo made careful studies of simple forms of motion, e.g.,
2/ how balls roll down an inclined plane and how pendulum oscillate. To most of Galileo's contemporaries such measurements must have appeared trivial, if not irrelevant, to their speculations on how the world works. Yet Galileo aspired to a different kind of understanding.
3/ He wanted to understand something precisely, rather than everything vaguely. Ironically, it was Galileo's type of understanding that enabled Newton's theory of gravitation to explain "how the world works".
What do I mention it? Because we have had lengthy
discussions here