Dr Kareem Carr, Statistics Person Profile picture
May 22, 2023 15 tweets 5 min read Read on X
WHY do we divide by n-1 when computing the sample variance?

I've never seen this way of explaining this concept anywhere else.

Read on if you want a completely new way of looking at this. Image
BACKGROUND

This explanation is going to be confusing if you're rusty on summation notation. So here is a quick review.

If you're comfortable with this concept, skip to the next tweet.

Summation notation is a compact way of talking about adding up n values. Image
We should also quickly review the "sample mean" or "sample average".

If you are comfortable with this concept, skip ahead to the next tweet.

We compute the sample mean by adding up all our observations and then dividing by the total number of observations. Image
Here are two key insights which will be important later.

INSIGHT 1: Notice that in the formula for the sample variance, we are subtracting the sample mean from each observation. Image
INSIGHT 2: We can think of the sample variance as computing the average distance to the sample mean but with an extra correction factor.

Our question then changes from "Why divide by n-1?" to "Where did the correction factor come from?" Image
IDEA: THE SAMPLE MEAN IS NOT INDEPENDENT OF OUR OBSERVATIONS

Each observation and the sample mean are slightly correlated because the sample mean is computed using all the observations. Image
The way I like to think about it is we're subtracting -1/n of the observation when we subtract the sample mean. Since we do this n times for each observation, n times -1/n equals 1. We are effectively subtracting 1 observation. This is why we effectively have n-1 observations.
This way of thinking about it is not mathematically rigorous but we can make it more rigorous.

What if we try to decorrelate the sample mean and each observation?
IDEA: DECORRELATING THE VALUES WITH ALGEBRA

I will use the first observation as an example.

STEP 1: We rearrange the terms so the mean no longer contains the first observation.

STEP 2: We rearrange the remaining expression to involve the average of the remaining n-1 values Image
The decorrelation procedure makes intuitive sense.

The average of the n-1 remaining values is uncorrelated with the first observation, and since it's just a sample containing n-1 values, it's also a reasonable estimate of the average of the total population.
IDEA: WE DON'T NEED TO ACTUALLY DECORRELATE. WE CAN JUST USE A CORRECTION FACTOR

Subtracting the sample mean from the first observation is identical to subtracting the average of all the values excluding the first observation times an extra correlation factor. Image
This applies to every other observation not just the first.

As you can imagine, recomputing the average of the n-1 remaining observations for each observation is tedious. It's much easier to subtract the same sample mean each time and then account for the correlation afterwards. Image
IDEA: BESSEL'S CORRECTION CANCELS THE CORRELATION FACTOR

Notice that the correlation factor and Bessel's correction cancel each other out when multiplied.

So that's the story of where the Bessel's correction comes in and why we divide by n-1. Image
This isn't the whole story. There is one more twist of mathematical luck that makes the algebra work out.

But this is the main idea.

I hope this makes the appearance of n-1 feel less mysterious.
I enjoy explaining math and statistics ideas. Follow me for more content like this, and don't forget to click the little notification bell so you don't miss out on future threads. Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Dr Kareem Carr, Statistics Person

Dr Kareem Carr, Statistics Person Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @kareem_carr

Jun 5
You may have heard hallucinations are a big problem in AI, that they make stuff up that sounds very convincing, but isn't real.

Hallucinations aren't the real issue. The real issue is Exact vs Approximate, and it's a much, much bigger problem. Image
When you fit a curve to data, you have choices.

You can force it to pass through every point, or you can approximate the overall shape of the points without hitting any single point exactly.
When it comes to AI, there's a similar choice.

These models are built to match the shape of language. In any given context, the model can either produce exactly the text it was trained on, or it can produce text that's close but not identical
Read 10 tweets
Jun 2
I’m deeply skeptical of the AI hype because I’ve seen this all before. I’ve watched Silicon Valley chase the dream of easy money from data over and over again, and they always hit a wall.

Story time.
First it was big data. The claim was that if you just piled up enough data, the answers would be so obvious that even the dumbest algorithm or biggest idiot could see them.

Models were an afterthought. People laughed at you if you said the details mattered.
Unsurprisingly, it didn't work out.

Next came data scientists. The idea was simple: hire smart science PhDs, point them at your pile of data, wait for the monetizable insights to roll in.
Read 13 tweets
Jun 1
As a statistician, this is extremely alarming. I’ve spent years thinking about the ethical principles that guide data analysis. Here are a few that feel most urgent: Image
RESPECT AUTONOMY

Collect data only with meaningful consent. People deserve control over how their information is used.

Example: If you're studying mobile app behavior, don’t log GPS location unless users explicitly opt in and understand the implications.
DO NO HARM

Anticipate and prevent harm, including breaches of privacy and stigmatization.

Example: If 100% of a small town tests positive for HIV, reporting that stat would violate privacy. Aggregating to the county level protects individuals while keeping the data useful.
Read 9 tweets
May 8
The kids using ChatGPT to cheat are massively fumbling the ball.

I would give almost anything to experience learning something like calculus for the first time with an AI assistant.
I have wasted an ungodly amount of time on poorly written math textbooks.

Confusing notation. Poorly worded statements that I puzzled over for hours. Typos that had me questioning my sanity for days.
These kids won't ever have to go through that.

They'll take a picture of the page, ask ChatGPT what it means, and instantly get an explanation tailored to exactly their level.
Read 7 tweets
May 7
Hot take: Students using chatgpt to cheat are just following the system’s logic to its natural conclusion, a system that treats learning as a series of hoops to jump through, not a path to becoming more fully oneself.
The tragedy is that teachers and students actually want the same thing, for the student to grow in capability and agency, but school pits them against each other, turning learning into compliance and grading into surveillance.
Properly understood, passing up a real chance to learn is like skipping out on great sex or premium ice cream. One could but why would one want to?
Read 6 tweets
Apr 25
If you think about how statistics works it’s extremely obvious why a model built on purely statistical patterns would “hallucinate”. Explanation in next tweet. Image
Very simply, statistics is about taking two points you know exist and drawing a line between them, basically completing patterns.

Sometimes that middle point is something that exists in the physical world, sometimes it’s something that could potentially exist, but doesn’t. Image
Imagine an algorithm that could predict what a couple’s kids might look like. How’s the algorithm supposed to know if one of those kids it predicted actually exists or not?

The child’s existence has no logical relationship to the genomics data the algorithm has available.
Read 4 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(