How did I get this poll with almost 29k responses to balance perfectly? A thread. 👇
Assuming most people didn't secretly flip a coin, where's the randomness in the poll coming from? I think it comes from three sources:
1. Some folks were genuinely picking randomly
2. Based on the comments, even for folks who used a system, the method they used was very unique to them and therefore really random relative to other people
3. From the perspective of the Twitter algorithm, each new person that gets shown the poll is a toss up in terms of whether they favor heads or tails, much like flipping a coin. It doesn't matter if they picked non-randomly. From the perspective of the poll, they appear random.
So, now that we've established that the people answering the poll are probably going to act a little bit like a flipping coin, what does statistics have to say about flipping a coin 29k times?
Law of Large Numbers
The average of a large number of observations should get closer to a particular value as more observations are collected. This value is called the "expected value". If we code heads as 0 and tails as 1 then the expected value for a fair coin should be 0.5.
Central Limit Theorem
The average of a large number of observations tends to cluster around the "expected value" in an increasingly tightly-clustered pattern that resembles a bell curve. We can't see the pattern with just one experiment but we do see it with lots of experiments.
You might be wondering what's a bell curve? It looks like this. The previous tweet is saying that most of the experiments will have averages that cluster around the center with fewer and fewer as the averages get more extreme.
If the bell curve feels a little abstract, don't worry. It's a lot more familiar than you might think. Men's heights are roughly distributed like a bell curve and so are the heights of women. So we've actually all been experiencing bell curves our whole lives.
You also might be wondering why I'm talking about "experiments". We only did one poll. Often statistics means thinking about the multiverse. We don't just think about our universe but every other universe where randomness would have caused our experiment to turn out differently.
Looking at our experiment in the context of the "multiverse" is what allows us to see that the results become from predictable as we get more observations.
If we assume our poll is like a fair coin then using the math of the bell curve, we can figure out what kind of results we can expect to get after 29k answers. As you can see below, there was about a 95% chance that the percent of heads would be between 0.494 and 0.506.
The precise proportion of heads in this poll was 0.504 which was well within the realm of possibility!
The one thing I did get lucky on is that the preference for heads and tails seems to be symmetric in the polling population. So for every person that prefers heads, there's an equal and opposite person that likes tails, and vice versa.
This didn't have to be true but will tend to give a close to 50-50 split when you select people randomly, even if their choices aren't random.
The first time I tried this poll experiment, it was pretty biased. I think because there was lack of symmetry in beliefs. My statistics savvy audience thought that people would be more likely to select the first option so they tried to "unbias" the poll by selecting the 2nd one.
My solution was just to tell them they were biased which caused them to be confused about what would happen on this poll, which unbiased them.
So there you go. That's the magic trick. I'm not a wizard. I'm just a statistician. 😏
Hope you enjoyed the thread. If you like this content and want to support it, please like and retweet the thread so others can enjoy it as well, and follow me to get more threads like this one in the future.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
One of the things I hate most about the cult of IQ is it leads to lot of magical thinking about how the brain works. There’s absolutely nothing shameful about relearning things you use to know.
Research shows forgetting is a normal part of human cognition.
The way to combat the natural tendency to forget is to relearn or retrieve the memory at regular intervals which is known as “spaced repetition”.
Relearning takes the strength of the memory back to 100% and the rate of forgetting is slower the next time.
Infographics of this dataset have been kicking around on the internet for years. It is an insult to real scientists everywhere. For every 10 likes, I will post a new ridiculous fact about how fake and ridiculous this "data" is.
They report data on 185 countries but *104* of those numbers (more than half!) are based on *zero* data collected from people from that country. ZERO.
Rather than acknowledge this lack of data, they decided to guessimate based on surrounding countries.
The IQ estimate for Equatorial Guinea was based on kids in a home for developmentally disabled kids living in Spain. Not even their home country. Spain.
People are getting thousands of likes for spreading this misinformation about sex differences. Let me explain why this interpretation of the data is wrong. 🧵
If you think 100% accuracy is too good to be true, trust your instincts.
The version of the model shown in the plot was basically fed the sex of the participants. That’s why it’s achieving 100% accuracy.
When the model was tested on a subset of people from the same dataset that it had *not* seen previously, the accuracy fell to 90%.
I keep seeing this Huberman clip all over my timeline so let’s use it as teachable moment to learn some statistics.
The basic mistake is not taking the people who are already pregnant out of the pool of people who could be pregnant the next month. Of the starting 100, fewer and fewer will remain each month.
It’s a little tedious to keep track of what number of people aren’t yet pregnant on each round, and then take 20% of that, and then add up all the pregnant people in each round.