My Authors
Read all threads
I think I had a tough time communicating with @yudapearl today. It’s worth sharing where I think we ended up misunderstanding each other. I don’t think he is likely to agree with me, but it's useful for me to articulate here.

Here’s the seed tweet:
I shared the Meng paper because it’s a nice discussion of how greater sample size doesn’t solve estimation problems. This is part of a strong opinion I have that collecting adequate data is the key challenge in most empirical problems. Some people will not agree with this.
Most folks thought I was talking about causal inference from the start. I was actually talking about the tool of *randomization*. IMO, Meng’s paper is an example of measuring the value of randomization for an estimation problem. Randomness is a complement to sample size.
An underrated aspect of causal inference is designing a data collection process that makes estimation possible. You can’t produce a useful model without data! The practice of experimentation as a special case of collecting adequate data for a variety of causal inference problems.
Ask any empirical researcher: data availability is likely a fundamental constraint for making progress in their field. Data constrains what we can estimate. Because of scarcity, sometimes researchers work backward, finding data and figuring out what they can use it to estimate.
Two key tools of data collection are 1) randomization and 2) intervention. They form the basis for the widespread practice of experimentation. Intervention is what allows us to create effects, and randomization is how we measure those effects with less bias.
(Sidebar: Many common practices fit under this loose definition of experimentation: A/B tests, bandit algorithms, Bayesian optimization, reinforcement learning. They are dialog with the real world where we cause some change and the record the consequences.)
From my (obviously strong) experimentalist perspective, the goal of most people practicing causal inference is: *finding or making data* that can let them estimate the models they need, to answer the queries they have, or make the decisions they need to make.
I expect Pearl won't agree because he may see causal inference as orthogonal to the practice of people getting their data. For me, an empirical researcher, I’d rather have good data + a simple causal inference problem than messy data + a complex one (even with all Pearl’s tools).
Randomization is useful for data collection in surveys as well as experiments. It’s a more general tool than simply causal inference. The Meng concepts for survey data are portable to causal inference, you could derive a similar data quality measure given an ATE estimator.
I would love to see Pearl’s perspective applied to improving experimentation and collecting adequate data. I am not well versed in it enough to know if work in this direction exists. Personally, I have turned to the statistics literature to learn about these things.
Parting thought: as a causal inference practitioner it's valuable to to invest in understanding sampling, randomization, experimental design, etc. Projects are often limited by what data you can design + you need to critically evaluate data as being suitable for certain tasks.
If you got to this point, I appreciate you reading this. It's been really helpful for me to get these thoughts down, and I'd love to hear any feedback you have on any aspect.
Missing some Tweet in this thread? You can try to force a refresh.

Enjoying this thread?

Keep Current with Sean J. Taylor

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!