Data Literacy Basics - Part 1
Below are five foundational concepts that EVERYONE should understand (in no particular order). Also, let me know what you would add.
1. Outliers rarely disprove trends.
I see this a lot. People, when presented with a statistic, will often try and discredit it by bringing up edge cases, or outliers. The reality is data, in general, has natural variation, even within a distribution or trend.
We all know this. If I were to say “The average height for an American male is about 5 feet 9 inches,” but my friend chimed in with “That can’t be true! My uncle is 6 feet 8 inches,” you surely wouldn’t agree that single data point disproves my statistic. That's an easy example as we are all familiar with the height of people, but for data we aren’t accustomed with this becomes very important to keep in mind.
🧵 (1/6)2. Correlation does not imply causation
I’m sure we’ve all heard this ~1000 times, but for good reason. When you see variables, data points, trends, distributions, etc. that are related or move together, this doesn’t necessarily mean one is causing a direct change in the other(s). In general, causal analysis is difficult. There might be other variables not accounted for (called confounding variables) explaining the correlation.
Textbook example: When ice cream sales increase, drowning incidents also tend to increase. However, this does not mean that eating ice cream causes drowning or vice-versa. The real reason for this correlation is that both ice cream sales and drownings increase during the summer, where warmer weather is the underlying cause of both.
Additionally, a correlation could be a coincidence made to look strong through visualization, like the correlation between the consumption of margarine and the divorce rate in Maine.
(2/6)