Tweet

How to get URL link on Twitter App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Aug 4 • 18 tweets • 6 min read Twitter logo

Read on Twitter

Scrolly

TEN types of statistical averages

THREE simple frameworks for thinking about measures of central tendency.

This thread has it all!

Warning: You may have heard people say there's only one thing called "the average" or "the mean". In this thread, we're going to use the word "average" or "mean" to apply to any one of a large family of measures of central tendency.

1. Mode

(Let's start slow. Feel free to skip the stuff you already know!)

This is the value that occurs most frequently in your data.

2. Median

If you line your data up from largest to smallest, then this is the value at the center of your data. (If you have an even number of data points then it's the number that's half way between those two central values.)

3. Arithmetic mean

This is what people usually mean by "the mean" or "the average". It's the gold standard. You add up all your data and divide by the number of observations.

4. Midrange

The value in the exact middle of the range of your data. It's halfway between the maximum and minimum value.

FRAMEWORK: Distance to Data

The mode, median, midrange and arithmetic mean might seem disconnected but there's a single mathematical idea that ties them together.

They all minimize the distance measure below for specific values of p.

They're the "closest" point to your data.

The idea is the "center" of our data is the the point that's closest to all the data points simultaneously.

The mode, median, midrange and arithmetic mean are at the center of our data according to four different definitions of distance.

5. Weighted arithmetic mean

In physics, the center of mass is the point where an object perfectly balances.

The weighted mean is kind of like the center of mass of your data when weighted according to your chosen weights. The formula is basically the same as the physics version

6 Geometric mean

To compute this mean, we multiply all the values together and take the nth root.

If your investments grew at a factor of x in the first year and y in the second then the average yearly growth of your investments is the geometric mean.

7. Harmonic Mean

You might be wondering when would anybody ever use this crazy mean?

It actually has plenty of real-world relevance. For example, if you drive to work at speed x and return home at speed y, the average speed of your round trip is the harmonic mean of x and y.

8. Root mean square

This one shows up in physics class as a measure of the power of waves. Waves vary in time and this is the right way of averaging over that variation.

This mean also shows up in a slightly modified form as a measure of average error in machine learning models

FRAMEWORK: The Algebraic Perspective

9. Power Mean

The root mean square and also the arithmetic, geometric and harmonic means probably seem disconnected as well but they have their own unifying principle.

They are specific examples of the power mean!

10. F-Mean

The power mean itself is just a specific example of a more general concept, the F-mean!

If there's a function f that never decreases in the range of our data, we can use it to define our own mean.

(You'll probably never use this but it's still fun to know.)

FRAMEWORK: A Shopping List of Desirable Criteria

We can further unify the concept of an average by thinking of them as a collection of procedures that usually have most of the following properties.

(Don't worry. I will explain these in plain English in the next tweet.)

Homogeneity: mean of k times the data is k times the mean
Symmetry: order of the data doesn't matter
Monotonicity: increasing any of the values never decreases the mean
Idempotence: mean of identical values is the value itself
Boundedness: mean is always between the min and max

SUMMARY:

Averages arise in diverse ways:
- measures of distance to our data
- analogies to physical properties (center of mass)
- summarizers of physical and real world processes like average speeds, interest rates and waves

Despite that diversity, they aren't disconnected concepts, there are several intriguing, unifying themes in their mathematical properties.

If you liked this thread and want more stuff like this on your timeline, give me a follow and don't forget to click the notification bell!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @kareem_carr

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Aug 1

TEN TIPS ON HOW TO SPEED READ:

One of the most valuable classes I took at Harvard was a short course on speed reading. Here's what I learned:

1. Minimize Fixations

Fixations are all the positions where your eyes stop as you're scanning a line of text.

Minimize these by read words in chunks. Don't focus on just one word at a time. Broaden your focus so you're always taking in multiple words at once.

2. Avoid Regressions.

"Regression" is a technical term for going back and reading stuff you just read. It's normal to feel like you need to do this but you don't. It's hard but you have to force yourself to keep pushing forward, and eventually the urge to regress will go away.

Read 20 tweets

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Jul 31

Should data scientists and applied statisticians put in the huge amount of effort needed to learn measure theory?

After years of searching, I finally found the answer in this book. Read on if you're curious.

Learn measure theory if you want to:

1. Define "independence" of random variables in a more rigorous way. Thinking about statistical independence in general as traditionally defined without a specific example in mind can get a bit hand-wavy. Measure theory addresses that.

2. Unify the language of continuous and discrete distributions (which saves us time since we don't have to repeat arguments to cover both cases).

3. Unify the language of single and multiple variables (which also saves us time)

Read 5 tweets

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Jul 21

Don't give in to this simplistic mindset. Let's talk about what the "wall" is and how we can overcome it.

The first thing that you should know is hitting a wall in math is normal.

It happens to everybody including "geniuses".

The "wall" isn't the limit of your personal ability. It's the limit of your current learning techniques which can always be improved upon.

People complain that trying harder mostly doesn't work after a certain point.

I agree. It doesn't.

Read 7 tweets

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Jul 17

Struggling with math in the past doesn't have to mean continuing to struggle with math in the future.

Here's Why:

Imagine if we taught math the way we teach music:
- one-on-one guidance
- bespoke exercises tailored to weaknesses
- emphasis on practice over talent

Instead we:
- leave kids to figure out good learning strategies on their own
- imply they're defective when things don't work out

Learning math is tricky because it's much more dependent on your learning strategies than other subjects.

You need to figure out the right learning strategies first before you can even begin to effectively learn the math itself.

Read 8 tweets

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Jul 12

ChatGPT influencers keep saying DATA SCIENCE IS OVER!

That's wrong.

Now that anybody can use machines to WRITE code, people who deeply UNDERSTAND what the code is doing are more VALUABLE than ever.

Here are my favorite books for data science beginners:

The six basic areas you need to cover to be a well-rounded data scientist are:
- Calculus
- Linear Algebra
- Probability Theory
- Statistics
- Programming
- Statistical Learning
Here are my recommendations:

CALCULUS. There are lots of solid calculus books out there. I like this one for it's depth, clarity of writing and elegant illustrations.

Read 9 tweets

Kareem Carr | Data Scientist | 📊📈

@kareem_carr

Jul 6

These are the same number. I'll explain why in just two tweets. 🧵👇

The difference between 1 and 0.9 is 0.1 but 0.999... is bigger than 0.9 so the difference between 0.999... and 1 must be smaller than 0.1

The difference between 1 and 0.99 is 0.01 so similarly the difference between 1 and 0.999... must be smaller than 0.01 too.

No matter how close we get to 1, the difference between 0.999... and 1 is smaller than that. What number is smaller than every possible difference? Zero.

So, the difference between 0.999... and 1 must be zero.

The unavoidable conclusion is 1 and 0.999... are the same number.

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter Twitter Thread URL to Unroll

Kareem Carr | Data Scientist | 📊📈

Try unrolling a thread yourself!

More from @kareem_carr

Kareem Carr | Data Scientist | 📊📈

Kareem Carr | Data Scientist | 📊📈

Kareem Carr | Data Scientist | 📊📈

Kareem Carr | Data Scientist | 📊📈

Kareem Carr | Data Scientist | 📊📈

Kareem Carr | Data Scientist | 📊📈

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!