DEFINITION OF A P-VALUE. Assume your theory is false. The P-VALUE is the probability of getting an outcome as extreme or even more extreme than what you got in your experiment.
THE LOGIC OF THE P-VALUE. Assume my theory is false. The probability of getting extreme results should be very small but I got an extreme result in my experiment. Therefore, I conclude that this is strong evidence that my theory is true. That's the logic of the p-value.
THE P-VALUE IS REASONABLE IN THEORY BUT TRICKY IN PRACTICE. In my opinion, the p-value is just a mathematical version of the way humans think. If we see something that seems unlikely given our beliefs, we often doubt those beliefs. In practice, the p-value can be tricky to use.
THE P-VALUE REQUIRES A GOOD DEFINITION OF WHEN YOUR THEORY IS FALSE. There are usually an infinite number of ways to define a world where your theory is false. P-values often fail when people use overly simplistic mathematical models of the processes that created their data.
If the mismatch between their mathematical models of the world and the actual world is too large then the probabilities we compute can become completely disconnected from reality.
THE P-VALUE MAY REQUIRE AN ACCURATE MODEL OF YOU (THE OBSERVER). The probability of getting the result you got depends on many things. If you sometimes do things like throw out data or repeat measurements then you're part of the system.
Your behavior affects the probability of getting your experimental results. Therefore, to be completely realistic, you need to have an ACCURATE model of your own behavior when you gather and analyze data. This is hard and a big part of why the p-value often fails as a tool.
BY DEFINITION, P-VALUES MUST SOMETIMES BE WRONG. When using p-values, we're working off of probabilities. By logic of the p-value itself, even with perfect use, some of your decisions will be wrong. You have to embrace this if you're going to use the p-values.
Badly defining what it means for your model to be false. Inaccurately modeling the chances of getting your data including your own behaviors. Not treating a p-value as a decision rule that can sometimes be wrong. These factors all contribute to misuse of the p-value in practice.
Hope this cleared some things up for you. Thanks for coming to my p-value TED talk!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
You may have heard hallucinations are a big problem in AI, that they make stuff up that sounds very convincing, but isn't real.
Hallucinations aren't the real issue. The real issue is Exact vs Approximate, and it's a much, much bigger problem.
When you fit a curve to data, you have choices.
You can force it to pass through every point, or you can approximate the overall shape of the points without hitting any single point exactly.
When it comes to AI, there's a similar choice.
These models are built to match the shape of language. In any given context, the model can either produce exactly the text it was trained on, or it can produce text that's close but not identical
I’m deeply skeptical of the AI hype because I’ve seen this all before. I’ve watched Silicon Valley chase the dream of easy money from data over and over again, and they always hit a wall.
Story time.
First it was big data. The claim was that if you just piled up enough data, the answers would be so obvious that even the dumbest algorithm or biggest idiot could see them.
Models were an afterthought. People laughed at you if you said the details mattered.
Unsurprisingly, it didn't work out.
Next came data scientists. The idea was simple: hire smart science PhDs, point them at your pile of data, wait for the monetizable insights to roll in.
As a statistician, this is extremely alarming. I’ve spent years thinking about the ethical principles that guide data analysis. Here are a few that feel most urgent:
RESPECT AUTONOMY
Collect data only with meaningful consent. People deserve control over how their information is used.
Example: If you're studying mobile app behavior, don’t log GPS location unless users explicitly opt in and understand the implications.
DO NO HARM
Anticipate and prevent harm, including breaches of privacy and stigmatization.
Example: If 100% of a small town tests positive for HIV, reporting that stat would violate privacy. Aggregating to the county level protects individuals while keeping the data useful.
Hot take: Students using chatgpt to cheat are just following the system’s logic to its natural conclusion, a system that treats learning as a series of hoops to jump through, not a path to becoming more fully oneself.
The tragedy is that teachers and students actually want the same thing, for the student to grow in capability and agency, but school pits them against each other, turning learning into compliance and grading into surveillance.
Properly understood, passing up a real chance to learn is like skipping out on great sex or premium ice cream. One could but why would one want to?
If you think about how statistics works it’s extremely obvious why a model built on purely statistical patterns would “hallucinate”. Explanation in next tweet.
Very simply, statistics is about taking two points you know exist and drawing a line between them, basically completing patterns.
Sometimes that middle point is something that exists in the physical world, sometimes it’s something that could potentially exist, but doesn’t.
Imagine an algorithm that could predict what a couple’s kids might look like. How’s the algorithm supposed to know if one of those kids it predicted actually exists or not?
The child’s existence has no logical relationship to the genomics data the algorithm has available.