1/ It might be useful to look carefully at the logic of proving an "impossibility theorem" for NN, and why it differs fundamentally from Minsky/Papert perceptrons. The way we show that a task is impossible in Rung-1 is to present two different data-generating models (DGM) that
2/ generate the SAME probability distribution (P) but assign two different answers to the research question Q. Thus, the limitation is not in a particular structure of the NN but in ANY method, however sophisticated, that gets its input from a distribution, lacking interventions.
3/ This is demonstrated so convincingly through Simpson's paradox ucla.in/2Jfl2VS,
even pictorially, w/o equations, in #Bookofwhy
Thus, it matters not if you call your method NN or "hierarchical nets" or "representation learning", passive observations => wrong answers
4/ Plus, the interventions needed must be PHYSICAL not ANALYTICAL. You cannot simulate the former with the latter. Reinforcement Learning deploys physical interventions and can take you to Rung 2; but not to Rung 3. Please read Simpson and have fun playing the Simpson machine.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Judea Pearl

Judea Pearl Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @yudapearl

5 Jul
1/ Readers ask: What's the simplest problem in which a combination of experimental and observational studies can be shown to be better than each study alone?
Ans. Consider X--->Z----> Y
with unobserved confounder between X & Z.
Query Q: Find P(y|do(x))
We have 2 valid estimands:
2/
ES1 = P(y|do(x)) estimable from the experiment
ES2 =SUM_z P(z|do(x))P(y|z), the first term is estimable from the experiment, the second from the observational study.
ES2 is better than ES1 for 3 reasons:
1. P(y|z) can rest on a larger sample
2. ES2 is composite (see
3/ (see ucla.in/2ocoWqq for advantage of composite estimators)
3. ES2 need not measure Y in the experimental study.
Remark: The validity of ES2 follows from do-calculus. Adding any edge to the graph invalidates ES2 and leaves ES1 the only estimand.
Read 4 tweets
28 Jun
1/ This post re-enforces my support of the Myth of the Lone Genius, as tweeted here: 6.12.2021 (1/ ) Why I refuse to "cancel" or "decolonialize" Euclid Geometry, Archimedes Rule and Newton's Law, despite peers pressure? Because, as explained here ucla.in/2Qg0Rfs (p.8),
2/ putting a human face behind theorems and discoveries makes science "not a book of facts and recipes, but a struggle of the human mind to unveil the mysteries of nature." Personalizing science education makes each student "an active participant in, not a passive recipient of,
3/ those theorems and discoveries". Thus, regardless of how many collaborators Newton had, telling a student that by an effort of intellect, curiosity, and hard labor he/she can discover the one thing missing from the puzzle, we are turning that students into a
Read 4 tweets
1 Feb
1/ Sharing an interesting observation from Frank Wiltzeck's book "Fundamentals."
In the 17th Century, while the entire scientific
world was pre-occupied with planetary motion and other
grand questions of philosophy, Galileo made careful studies of simple forms of motion, e.g.,
2/ how balls roll down an inclined plane and how pendulum oscillate. To most of Galileo's contemporaries such measurements must have appeared trivial, if not irrelevant, to their speculations on how the world works. Yet Galileo aspired to a different kind of understanding.
3/ He wanted to understand something precisely, rather than everything vaguely. Ironically, it was Galileo's type of understanding that enabled Newton's theory of gravitation to explain "how the world works".

What do I mention it? Because we have had lengthy
discussions here
Read 4 tweets
15 Jan
A letter I wrote to the California Board of Education:

I strongly oppose the 2021 California Ethnic Studies Model Curriculum.

I am particularly alarmed by its attempt to depict inter-ethnic relationships as a irreconcilable struggle between racially-defined “oppressed” 1/4
and "oppressors” and by the way it associates "whiteness" with "oppression" and "colonialism".

I am a "white" Jewish American, and I believe that the history of my people is a model of emancipation from oppression and colonialism, culminating in the State of Israel which is 2/4
an inspirational model of an oppressed ethnic minority lifting itself from the margin of history to become a world center of art, science and entrepreneurship -- a multi-colored light-house of free speech and gender equality.

I want my grandchildren to take pride in this 3/4
Read 4 tweets
6 Nov 20
This question annoys ALL students (and professors) of ML, but they are afraid to ask. Thanks for raising it in this "no hand waving" forum. Take two causal diagrams:
X-->Y and X<--Y, and ask a neural network to decide which is more probable, after seeing 10 billion samples. 1/n
The answer will be: No difference; each diagram scores the same fit as the other. Let's be more sophisticated: assign each diagram a prior and run a Bayesian analysis on the samples. Lo and Behold, the posteriors will equal to the priors no matter how we start. How come? 2/n
Isn't a neural network supposed to learn the truth given enough data? Ans. No! Learning only occurs when the learnable offends the data less than its competitors. Our two diagrams never offend any data, so nothing is learnable. Aha! But what if our data involves interventions? 3/
Read 6 tweets
27 Oct 20
When I see a paper on explainability, first question I
ask is: "What does it explain?", the data-fitting strategy of
the fitter? or real-life events such as death or survival.
I believe this paper arxiv.org/pdf/2010.10596…
is mostly about the former, as can be seen from the 1/
equations and from the absence of any world-model.
While it is sometimes useful to explain the data-fitting system (eg. for debugging), it is also important to distinguish this kind of counterfactual explanations
from the kind generated in the causal inference literature.
2/3
Beware, a model-blind system might conclude that
the rooster crow explains the sunrise. It might also explain that your loan was denied because you are a male, and would also have been denied if you were a female. I wonder how ML folks would debug this system.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(