3 Oct, 13 tweets, 2 min read
FOUR CARDINAL RULES OF STATISTICS 📈📊📉

1/🧵
ONE: CORRELATION DOES NOT IMPLY CAUSATION.

Yes, I know you know this, but it’s so easy to forget!

Yeah, YOU OVER THERE, you with the p-value of 0.0000001 — yes, YOU!! That’s not causation.

2/
No matter how small the p-value for a regression of IQ onto shoe size is, that doesn’t mean that big feet cause smarts!!

It just means that grown-ups tend to have bigger feet and higher IQs than kids.

3/
So, unless you can design your study to uncover causation (very hard to do in most practical settings — the field of causal inference is devoted to understanding the settings in which it is possible), the best you can do is to discover correlations.

4/
TWO: A P-VALUE IS JUST A TEST OF SAMPLE SIZE.

Read that again — I mean what I said!

If your null hypothesis doesn’t hold (and null hypotheses never hold IRL) then the larger your sample size, the smaller your p-value will tend to be.

5/
If you’re testing whether mean=0 and actually the truth is that mean=0.000000001, and if you have a large enough sample size, then YOU WILL GET A TINY P-VALUE.

6/
Why does this matter?

In many contemporary settings (think: the internet), sample sizes are so huge that we can get TINY p-values even when the deviation from the null hypothesis is negligible.

In other words, we can have STATISTICAL significance w/o PRACTICAL significance.

7/
Often, people focus on that tiny p-value, and the fact that the effect is of **literally no practical relevance** is totally lost.

8/
This also means that with a large enough sample size we can reject basically ANY null hypothesis (since the null hypothesis never exactly holds IRL, but it might be “close enough” that the violation of the null hypothesis is not important).

9/
Want to write a paper saying Lucky Charms consumption is correlated w/blood type? W/a large enough sample size, you can get a small p-value. (Provided there’s some super convoluted mechanism with some teeny effect size… which there probably is, b/c IRL null never holds)

10/
THREE: SEEK AND YOU SHALL FIND.

If you look at your data for long enough, you will find something interesting, even if only by chance!

In principle, we know that we need to perform a correction for multiple testing if we conduct a bunch of tests.

11/
But in practice, what if we decide what test(s) to conduct AFTER we look at data? Our p-value will be misleadingly small because we peeked at the data.

Pre-specifying our analysis plan in advance keeps us honest… but in reality, it’s hard to do!!!

12/
That’s it for today. Have a great weekend! 🌞⛱️🩴

13/13

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

# More from @daniela_witten

3 Oct
WHY DID I SAY A P-VALUE IS A TEST OF SAMPLE SIZE?

Suppose we want to test whether the mean of some random variable X is zero. To keep it simple, X1,...,Xn are i.i.d. N(mu, 1). Testing if mu=0.

1/🧵
Now suppose that IRL mu=0.001 (remember- IRL the null hypothesis never exactly holds!)

The z-test for testing mu=0 involves computing

Z=sqrt(n)*Xbar/sigma

and comparing to a N(0,1) distribution. And sigma=1 by my earlier assumption that X_1,...,X_n are i.i.d. N(mu,1).

2/
So Z=sqrt(n)*Xbar.

Suppose n=10^12 so sqrt(n)=1000000.

This means Z=1000000*Xbar.

And Xbar should be somewhere in the ballpark of 0.001 since it's the sample mean of a bunch of observations with mean mu=0.001.

3/
2 Oct
Last week, a paper came out that received a lot of very harsh criticism from the scientific community. First things first: I 100% agree with a lot of that criticism.

Criticizing a paper (or agreeing with others' criticisms of a paper) is okay. That's how science works.

1/
HOWEVER, after seeing the paper, I shot off a careless tweet. It was meant to be lighthearted, but it missed the mark!!!! It came across as a personal attack on the authors.

It was a mistake to have posted it, and I have since deleted it.

2/
After I posted it, a lot of people came to my defense and said that my tweet was fine, in light of the issues with the paper in question. I appreciate the support.

But the fact remains: while scientific criticism is fine, an attack on the authors is not. I was in the wrong.

3/
19 Sep
Hi scientists, could you please help me understand why this Harvard site says that the FPR of PCR can be as high as 5%?

Is this just due to the risk of sample/lab contamination (hopefully << 5%), or do they consider it a FP if someone who recovered has trace amounts of virus? 1/
By contrast this website from MIT is in line with what I thought to be true- virtually no FP from PCR
2/
Also I’m a statistician with an ok understanding of biology so please no lectures on specificity versus sensitivity or explaining what PCR is 🙏 🤣
3/3
11 Sep
today my husband said “in a normal year we’d be in air conditioned offices all day and we wouldn’t even notice the horrible air quality”

and that’s the moment I discovered that my husband thinks my office for the past literally 10 years has air conditioning
even though it doesn’t have A/C, i do miss it so— for one thing, my office has a door that I can close.

also, who wouldn’t love an office in a building that is ambiguously inspired by either a monastery or a prison?
my other office has a window that can only be opened with a screwdriver, so yeah this is my nice office
21 Aug
Pre-covid
And this with @aristeinberg
You can see the look on my face
9 Aug
The Bias-Variance Trade-Off & "DOUBLE DESCENT" 🧵

Remember the bias-variance trade-off? It says that models perform well for an "intermediate level of flexibility". You've seen the picture of the U-shape test error curve.

We try to hit the "sweet spot" of flexibility.

1/🧵
This U-shape comes from the fact that

Exp. Pred. Error = Irreducible Error + Bias^2 + Var

As flexibility increases, (squared) bias decreases & variance increases. The "sweet spot" requires trading off bias and variance -- i.e. a model with intermediate level of flexibility.

2/
In the past few yrs, (and particularly in the context of deep learning) ppl have noticed "double descent" -- when you continue to fit increasingly flexible models that interpolate the training data, then the test error can start to DECREASE again!!

Check it out:
3/