Latest Twitter Threads by @stat_sherpa on Thread Reader App

Feb 23, 2025 • 7 tweets • 5 min read

Data Literacy – Part 3: Hypothesis Testing

HOW THE HELL DO DATA NERDS ANSWER QUESTIONS WITH DATA???

Data nerds tend to say stuff like, “The data show a statistically significant difference…” but what does that mean, and how do they decide that?

Below we’ll go through a quick example of how a statistician might answer a question about age differences between Democrats and Republicans (with real data).
🧵(1/7)

Suppose you wanted to test if there is a difference in age between Democrats and Republicans. There are only two possible answers to this initial question:

1. There IS NOT a difference in age between Democrats and Republicans.
2. There IS a difference in age between Democrats and Republicans.

It might seem basic, but in the world of data, your question can have a large effect on the type of analysis you do. For example, the following three questions may require different methods or tests to answer:

1. “On average, is there a difference in age between Democrats and Republicans?”
2. “Are Democrats typically younger than Republicans?”
3. “Do Republicans tend to be older than Democrats?”

For this we’ll focus on the first question and the two possible outcomes (our hypotheses) using data from the 2018 General Social Survey.

Oct 12, 2024 • 6 tweets • 5 min read

Data Literacy Basics – Part 2
Here are five more foundational concepts that EVERYONE should be aware of (again, in no particular order). 🧵

1. Not all differences are significant
When we look at data, it’s easy to think that any change or difference is important. In reality, data fluctuates all the time and it’s difficult to determine if a small change is natural variation or a true difference.

Take the weather as an example. Say the average temperature in a location last June was 75.1°F, but the previous year it was 73.6°F. Sure, the average temperature went up 1.5°F, but is this significant? How does this temp compare to all the previous years? Is this unusual or unexpected? These are good questions to keep in mind that will raise your data literacy.

A lot of Statistics deals with hypothesis testing, which aims to answer the question “how likely is it that this data is just natural variation rather than a significant difference?”
(1/6)

2. Regression to the mean
One tricky aspect of data is that, even when something seems to show a big difference, it might not actually be significant. Extreme outlier events can occur, but they often return to normal quickly, a phenomenon known as regression to the mean.

When people spot an outlier—especially if it's the most recent data point—their immediate reaction is often to assume it signals a change in trends.

You see this often in sports. An athlete might have an amazing game, scoring far beyond their typical performance. Commentators might rave about this "breakthrough." However, in the next few games, their performance often returns to what's more typical for them. This doesn't mean they've lost skill, or something has gone wrong; it's just regression to the mean. The exceptional game was an outlier, and over time, their performance will even out to what it usually is. Keep this in mind when looking at single data points.
(2/6)

Oct 5, 2024 • 6 tweets • 5 min read

Data Literacy Basics - Part 1
Below are five foundational concepts that EVERYONE should understand (in no particular order). Also, let me know what you would add.

1. Outliers rarely disprove trends.
I see this a lot. People, when presented with a statistic, will often try and discredit it by bringing up edge cases, or outliers. The reality is data, in general, has natural variation, even within a distribution or trend.

We all know this. If I were to say “The average height for an American male is about 5 feet 9 inches,” but my friend chimed in with “That can’t be true! My uncle is 6 feet 8 inches,” you surely wouldn’t agree that single data point disproves my statistic. That's an easy example as we are all familiar with the height of people, but for data we aren’t accustomed with this becomes very important to keep in mind.
🧵 (1/6)

2. Correlation does not imply causation
I’m sure we’ve all heard this ~1000 times, but for good reason. When you see variables, data points, trends, distributions, etc. that are related or move together, this doesn’t necessarily mean one is causing a direct change in the other(s). In general, causal analysis is difficult. There might be other variables not accounted for (called confounding variables) explaining the correlation.

Textbook example: When ice cream sales increase, drowning incidents also tend to increase. However, this does not mean that eating ice cream causes drowning or vice-versa. The real reason for this correlation is that both ice cream sales and drownings increase during the summer, where warmer weather is the underlying cause of both.

Additionally, a correlation could be a coincidence made to look strong through visualization, like the correlation between the consumption of margarine and the divorce rate in Maine.
(2/6)

Share this page!

Enter URL or ID to Unroll