Rex
Applied Scientist in Industry. Previously UCSD. Princeton PhD. Follow me for recreational methods trash talk.
Apr 12 11 tweets 3 min read
Recently had a debate about two-way fixed effects and how it pretends to magically solve causal inference problems by just changing to a different DV than the one we actually care about. Here's that explained visually.

Here's raw annual city temperature: Image Say my research design is confounded by annual events I don't understand and can't measure well, so I include a time fixed effect to "control" for those unmeasured confounders.
Feb 10 8 tweets 2 min read
Since we're doing the "what if the executive ignores the courts" thought experiment - first off that's already true just in degree. Second, the checks and balances on the U.S. come from the states. If federal courts don't have authority, then the states also don't have to follow. The federal government doesn't rule the states, it arbitrates cooperation between the states. That's still true despite everyone's best effort to try to capture D.C. and impose their will on the rest of the country for a hundred years.
Nov 16, 2024 4 tweets 1 min read
"If markets are so dumb why don't you just beat them" is maybe my favorite did not understand the assignment response.

If I think a # is going up and down at best because of a random walk and at worst coordinated manipulation, I do not win by giving all my money to this #. There's some pool of people and institutions that I have no visibility into making bets that I barely have any visibility into, usually not even an order book.

With polymarket/fly by night crypto market, even if you won you still might lose when the exchange evaporates.
Nov 13, 2024 22 tweets 6 min read
My F.A.Q. for understanding election forecasting and probabilistic forecasting in general: I'll start first with a controversial but useful assumption-we are living in a deterministic universe. It's just a state machine. Every turn of the crank updates where all the energy/matter is. Until heat death. Probability/uncertainty are information concepts not physical ones.
Oct 14, 2024 4 tweets 1 min read
If grad students are civilians then don't let them publish. Or make them publish under someone else's license like every other trade does. Can't have it both ways. A junior engineer doesn't design a plane part that kills a bunch of people and then all the engineers in their field take to Twitter to argue how no one is really responsible and to stop talking about it.
Aug 24, 2024 4 tweets 1 min read
They're not even that. They're a separate statistical model, about a lower bound, for how close one estimand, could possibly be to another one. All 4 things of which are usually implicit and unstated because nobody understands what they're doing or why. When I calc the mean height in my family, that measurement doesn't come with CI. It's fixed. When I regress it on age in my family, that slope doesn't come with CI, it's fixed. For it to need CI you have to explicitly introduce more machinery. Not just because lm() reports it.
Jul 28, 2024 14 tweets 3 min read
Statistics are exact calculations based on data on hand. They have no uncertainty.

'Confidence' is scientific model of how a statistic might map to a real unobserved estimand we care about.

That model usually isn't very good. Incorporating the simplest sources of known error that are conveniently in closed form. For example, a sample mean doesn't have a confidence range with respect to the true sample mean. It is the sample mean.

If the estimand you care about is the population mean, which is unobserved, you may propose some mapping between the two. That proposal has to be good/argued.
Jun 25, 2024 4 tweets 2 min read
I'll just throw in my observation that the credibility revolution hasn't been a single clear benefit at the possible expense of other things. IMO a lot of identification strategies are entirely hallucinations and it's served as a kind of publishing cheat code for trash data. What I've seen over and over again is an author placing the burden on the reader to guess how the strategy fails. It's a game. Anyone with deep knowledge of the data generating process probably can think of 50 ways but the avg reader/reviewer can't.
Jun 8, 2024 5 tweets 1 min read
-Any science on people immediately drops the IQ 80 points
-Lack of genuine curiosity
-Unjustifiably strong priors based on trash observational work or usually just straight social identity
-Few levers to pull given the correct info
-Little professional reward. Best case you confirm something we already believe. Worst case debunk something we believe and the mob shows up at your house -Normative industrial complex that inserts itself as the religious moral gatekeepers between people and the research that could actually help them
Jun 6, 2024 26 tweets 10 min read
Are you interested in:
Causal inference? Data best practices? Time series? COVID-19? Awkward methods drama?
Then check out:
"How to be a Curious Consumer of COVID-19 Modeling: 7 Data Science Lessons from ... (Feyman et al. 2020)"
rexdouglass.github.io/Douglass_2024_…
Image Feyman et al. (2020) ask whether COVID-19 shelter-in-place orders actually kept people inside or if they were ignored and people would have stayed inside anyway.

It's a very hard and very important question! Image
May 29, 2024 5 tweets 2 min read
The concept of time really throws people in modeling. The DAGs with time indices are narly. Often you hear time as a cause, time as a treatment assignment, all sorts of weird ideas. In this example, something happened at time t and there's a visible discontinuity in some outcome Y. Cool. Something happened on day t that we want to try to call a treatment D. Lots of other stuff probably happened too, and lots of stuff Z probably cause both Y and D.
Mar 13, 2023 5 tweets 2 min read
Linear models require actual linearity, not just monotonicity, in every covariate. Really weird things happen when that doesn't hold. And I've never seen a real world DGP where it held.
Mar 12, 2023 4 tweets 1 min read
If your biggest concern with a lit review is that it didn't sufficiently explicate "absence of evidence isn't evidence of absence" for civilians, and not that weighted mean is an insane way to think about synthesizing a posterior from a dozen different estimands and domains... Meta analysis combining different studies does exactly one thing - pool observations from comparable domains to increase statistical power. It doesn't average out the stupidity of a bunch of broken designs leaving just a real true estimate. That's not how this works. Not at all.
Jan 23, 2023 15 tweets 3 min read
Let's rant about Pandas: It reimplements vectors, poorly, and half the time tells you to cast them to numpy arrays to do something simple and the other half wants you to use a pandas function. And if you screw up indexing or hit an edge case, it'll return a series instead of a dataframe and break things.
Jan 2, 2023 5 tweets 2 min read
FWIW, linear models with more than a few predictors aren't actually interpretable, people just think they can read them like tea leaves (Achen 2005). Multiple endogenous and highly correlated things will flips signs all over the place.
journals.sagepub.com/doi/10.1080/07… For more context, I used to like this plot (used in a course myself once), but have since grown to dislike it more and more. On the high end you have people finding specific mechanisms in NN. On the low end you have people presenting single parameters from garbage can regressions
Dec 23, 2022 4 tweets 1 min read
Jail-breaking youChat on day 1
Dec 4, 2022 9 tweets 4 min read
Experiments with ChatGPT on 9 digit addition: This started off as an R emulation experiment but quickly shifted to just basic math. Simulating an R terminal, its addition was correct up to 5 digits and then was hit or miss afterward. Sometimes by small amounts sometimes by large. ImageImageImage
Nov 16, 2022 10 tweets 3 min read
Some prompt examples with Galactica: A tautology

What does causal identification require? Image
Nov 15, 2022 5 tweets 1 min read
FWIW, if you're going to complain about gatekeeping then you better have a preprint and repo. I can't tell if reviewer 2 is being unfair, I have no idea whether there's a real result or not. It's like complaining the club has a bouncer, by saying you really deserve to be inside. It comes off as entitled manifest destiny that the author has the right to have it stamped as truth by a big tent journal. It's like 50 bad regressions on a ton of problematic datasets, with no data or repo. One dumb reviewer is the least of things that have gone wrong.
Nov 6, 2022 26 tweets 7 min read
It is both empirically true and theoretically useful to think about conflict as continuous bargaining. All that's necessary is that a message could be sent, the receiver has uncertainty, and that there is some pie that could be divided but we're instead burning it over time. It is a fallacy that information is conveyed by talking. For a lot of people this is counter-intuitive, but if I flip a coin, there is one bit of entropy (your uncertainty over the true state), and no matter how many tweets I send you I can never transmit more than 1 bit of info.
Oct 29, 2022 5 tweets 2 min read
-The only thing that actually matters about a scientific contribution is the new evidence
-Everything else should be considered documentation for an end user to run their own inference
-A prior, updating procedure, and post should be provided as an illustration of what to try There are lots of reasons for this
-Estimands are usually stated so vaguely that analysis is meaningless anyway
-The real estimands aren't identified most of the time
-The domain is too vague to be meaningful
-The cleaning/modeling steps radically alter the domain without warning