SINCE @kierisi has threatened to sarcastically/chaotically say incorrect things about p-values during #sliced tonight 😱 just to annoy people 😉,
I thought I’d do a quick thread on what a 🚨p-value🚨actually is.
🧵
(1/n)
Computationally, a p-value is p(data as extreme as ours | null). Imagine a 🌎 where the null hypothesis is true (e.g. there is no difference in cat fur shininess for cats eating food A vs. food B for 2 weeks), and see how extreme your observed data would be in that 🌎 (2/n)
So, you can think of the p-value as representing our data’s compatibility with the null hypothesis. Low p-values mean our data is not very likely in a world where the null is true. High p-values mean it is relatively likely. (3/n)
In our cat example, we could measure the difference in mean fur shininess between groups A and B, and observe a difference of 3.5 shine points ✨. (4/n)
A p-value answers the question “if there were no true difference in fur shininess between the groups, how often do we expect to observe a difference between the groups we sampled that’s greater than or equal to 3.5?” (5/n)
When the data are pretty extreme/rare/unexpected (low p) in a world where the null is true, we start to get suspicious that the null might not be true…(it still could be tho! and might be a better fit than another hypothesis) (6/n)
In Null Hypothesis Significance Testing (NHST), we use the p-value to make decisions about our data. NHST is a decision-making tool. In this case, we choose a cut off (usually 0.05 but that’s COMPLETELY ARBITRARY)… (7/n)
…and decide in advance that if our p-value is smaller than the cutoff, we will act as if the null is FALSE (if the p-value is not smaller, we act as if the null could still be true), and we call that test “significant”. (8/n)
When we adhere to this decision-making rule (and a bunch of assumptions about the data/model…etc) the beauty of NHST is that we can CONTROL our Type I error rate. “We shall not be too often wrong” (9/n).
Our expected Type I error rate (False Positives, aka how likely we are to ACT like the null is FALSE when it’s actually TRUE) will be equal to that cutoff we chose “in the long run” (10/n)
A Type I error can only happen when the null is TRUE. So the Type I error rate (5% if using 0.05 cutoff) is not our OVERALL error rate. We can make another type of error. A Type II (False Negative) error where we act like the null is TRUE, but it’s FALSE. (11/n)
(aka WE FAIL TO DETECT a real effect. 🚨)
Often we want to balance our error rates rather than just choose a Type I error rate/cutoff. But that’s a thread for a different time. (12/n)
IN SUMMARY:
✅ p-values are a measure of how extreme our observed data (measured through a test statistic) would be in a world where the null hypothesis is TRUE. (13/n)
✅ we often see p-values being used in NHST, which is a decision-making tool. We decide on a cutoff and if a p-value is LESS than the cutoff, we will act like the null is false. (14/n)
✅If the p-value is > cutoff, we act like the null could be true. This decision-making tool allows us to know (*if* all assumptions are met) what our expected Type I error will be. Controlling error rates over repeated measurements is IMO, the benefit of NHST + p-vals. (15/15)
BONUS:
I like to tell people that p-values + NHST is a weaker form of reductio ad absurdum. RAA tries to disprove things by assuming the opposite and showing that that leads to something impossible happening. E.g.
Hypothesis: My run today was 100 miles.
Consequence: I run at 6 mph. That means my run would be over 16 hours long.
Impossibility: I only ran for an hour.
Conclusion: My run today was NOT 100 miles.
P-values do something similar.
Hypothesis: there is no difference between fur shininess of cats in groups A and B.
Consequence: then we expect observed differences to be ~ N(0,1) due to sampling variation.
Implausibility: our observed difference is 3.5, p < 0.05, WOW SO UNLIKELY!
Conclusion: We will ACT as if there is a difference between the fur shininess of cats in groups A and B.
ALRIGHT fellow statisticians, tell me what nuance I got wrong!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Step 1: Think about becoming a lawyer but ditch that because you can’t stand foreign political history classes. Add a philosophy double major because sureee that’ll help🙄. Then switch to psychology, meet an awesome statistician and decide you love statistics
in your last semester of college.
graduate. Work in a cognitive neuroscience lab while living at home, apply to data science grad programs, get rejected by Berkeley, get into Chapman university, find an advisor who needs someone with stats AND psych expertise,
Moving from psych to stats/DS is totally doable Depending on your training, there may be some gaps you need to fill, content-wise, but those gaps 1) aren't insurmountable + 2) will not automatically make you a bad data person just because you're working on filling them.
Doing good DS requires hard work/rigor but it’s not exclusive to “math” people. You can do it.
2/8
Personally, I had gaps in math + comp sci. I learned to code (Python, R, C++, SQL) and took/audited a bunch of probability, stats, and linear algebra classes. Those classes CERTAINLY helped me, but I could've learned the content w/o them it was just easier/more structured.