1/

Get a cup of coffee.

In this thread, I'll walk you through 2 probability concepts: Standard Deviation (SD) and Mean Absolute Deviation (MAD).

This will give you insight into Fat Tails -- which are super useful in investing and in many other fields.
2/

Recently, I watched 2 probability "mini-lectures" on YouTube by Nassim Taleb.

One ~10 min lecture covered SD and MAD. The other ~6 min lecture covered Fat Tails.

In these ~16 mins, @nntaleb shared so many useful nuggets that I had to write this thread to unpack them.
3/

For those curious, here are the YouTube links to the lectures:

SD and MAD (~10 min):

Fat Tails (~6 min):
4/

The first thing to understand is the concept of a Random Variable.

In essence, a Random Variable is a number that depends on a random event.

For example, when we roll a die, we get a Random Variable -- a number from the set {1, 2, 3, 4, 5, 6}.
5/

Every Random Variable has a Probability Distribution.

This tells us all the possible values the Random Variable can take, and their respective probabilities.

For example, when we roll a fair die, we get a Random Variable with this Probability Distribution:
6/

A fair die has a very simple Probability Distribution: there are only 6 possibilities, and all of them are equally likely (with probability 1/6 each).

What if our Random Variable is something more complicated?

For example, the height of a randomly chosen US adult.
7/

Now, we have more than 6 possibilities.

Any number between, say, 4 ft and 8 ft, is a possibility. There are infinitely many such numbers.

And they're not all equally likely. The extremes (4 ft, 8 ft) are rare. Most people fall in a narrow range around, say, 5.5 ft.
8/

The Gaussian Distribution is usually a good fit for this.

The average height (say, 5.5 ft) is the most likely scenario.

As we go further from the average on either side, the probabilities drop off sharply.

Like so:
9/

More precisely, the Y-axis above shows probability *densities*, not actual probabilities.

That is, we only look at height *ranges* (eg, 5 ft to 6ft), not individual heights.

For every such range, the area under the curve is the probability of lying in the range.

Like so:
10/

This is a wonderful connection between probability and geometry: we calculate probabilities by measuring areas under curves.

The total area under a probability density curve has to be 1, reflecting a 100% chance of an outcome between -infinity and +infinity.
11/

Given a distribution like this, we can ask 2 questions to try to understand it better:

First, what's the population *average*? (Also called *mean* or *expectation*, here 5.5 ft).

Second, by how much do individuals tend to *deviate* from this population mean?
12/

The first question -- calculating the population's mean -- is reasonably straightforward.

For a discrete distribution (like a die roll), we simply multiply every possible outcome by its probability, and add them all up:
13/

For a continuous distribution (like our Gaussian heights), we achieve the same thing by integrating the probability density function.

(Don't worry too much if you don't get this math.)
14/

Now for the *deviations* from the mean.

The difficulty here is that individual samples can deviate both *positively* and *negatively* from the mean.

And the way we've defined the mean guarantees that these deviations will *exactly* cancel each other out.
15/

For example, with a die roll, a 4 (deviation +0.5) exactly cancels out a 3 (deviation -0.5).

Similarly, an abnormally tall person (height 6.4 ft, deviation +0.9 ft) cancels out an abnormally short person (height 4.6 ft, deviation -0.9 ft).
16/

So, adding up these deviations gets us to a grand total of exactly zero -- no matter what the underlying probability distribution.

To get around this positive/negative cancellation, we eliminate all negative deviations -- by transforming them into positive ones.
17/

Math offers 2 simple ways to transform a negative quantity into a positive one:

- Take the *absolute value* of the negative quantity, or
- *Square* the negative quantity.

The *absolute value* leads to MAD. *Squaring* leads to SD.

Formulas:
18/

For the Gaussian Distribution, MAD and SD are given by the formulas below.

The ratio of SD to MAD is about 1.25 -- independent of the parameters mu and sigma of the distribution.
19/

So, for the Gaussian Distribution, MAD never exceeds SD.

In fact, this is a general result that follows directly from the convexity of *squaring*.

No matter what distribution we have, its MAD can never exceed its SD.

Proof:
20/

The convexity of squaring also has deeper consequences.

This is a key insight from @nntaleb's mini-lectures.

Because squaring is convex, SD gives greater emphasis to large departures from the mean.

Such large departures, by definition, are at the tails.
21/

Therefore, SD will be much larger than MAD for Fat Tailed distributions!

In fact, the ratio of SD to MAD is an indicator of how Fat Tailed the distribution is.

For the Gaussian, this ratio is ~1.25. Not very Fat Tailed.
22/

@nntaleb goes on to present a super illuminating Fat Tailed example: one large entry N in a sea of N-1 zeros.

As N gets larger, the distribution gets more and more Fat Tailed. Its SD goes to infinity. But its MAD never exceeds 2. So the SD to MAD ratio grows without bound.
23/

I had never thought about Fat Tails in terms of the "SD to MAD ratio" before. But as Taleb says, this view makes a lot of sense.

Another example of a Fat Tail is a Pareto Distribution (aka, a Power Law) -- with applications from wealth inequality to social network analysis.
24/

These distributions have 2 parameters: L and alpha.

The Random Variable can take any value greater than or equal to L, with probabilities dropping off as 1/x^(1 + alpha).

As alpha becomes larger, the Tail becomes less Fat.

The famous 80/20 Principle belongs to this class:
25/

Pareto Distributions are super interesting.

For alpha < 2, they have SD = infinity and SD/MAD = infinity! That's heavily Fat Tailed.

In particular, this applies when alpha = ~1.161, the 80/20 Principle.

Thus, SD often doesn't apply to Fat Tails. It may not even exist!
26/

There are 2 additional lenses through which we can view MAD and SD:

Lens 1. What % of the population lies within k deviations of the mean?

Lens 2. (Invert, Always Invert!). How many deviations are required to capture T% of the population?
27/

For example, with a Gaussian Distribution, ~57.5% of the population falls within 1 MAD of the mean. ~68.3% of the population falls within 1 SD.

Similarly, if we want to cover 90% of the population, we need a window of ~2.06 MADs (~1.64 SDs) on either side of the mean.
28/

The Fatter the Tail, the greater the % of the population that lies within k SDs of the mean.

For example, ~68.27% of a Gaussian population lies within 1 SD of its mean.

But ~92.45% of a Pareto population (L = 1, alpha = 3) lies within 1 SD of its mean.
29/

This is another key insight from Taleb's lectures.

Fatter Tails lead to a *larger* (NOT smaller) % of the population within an SD of the mean.

The convexity of squaring increases SD, and hence the fraction of the population within an SD's throw of the mean.
30/

Key lesson: Just knowing the mean and SD of a distribution may not give us enough information to reason about it intelligently.

Depending on the underlying statistics, ~68% or ~92% of the population may lie within an SD of the mean. Very different scenarios!
31/

If you're still with me, thank you very much!

I know this thread has involved somewhat more math than usual.

But I hope you got *some* feeling for MADs, SDs, and Fat Tails out of it.

Please stay safe. Enjoy your weekend!

/End

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with 10-K Diver

10-K Diver Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @10kdiver

8 May
1/

Get a cup of coffee.

In this thread, I'll walk you through the basics of Decision Fatigue.

Understanding this can help us improve the quality of our "high value" decisions, while reducing the number of "low value" decisions we need to make.
2/

Every day, from the minute we wake up, we have a number of decisions to make.

Some are "low value" decisions. A few months or a year from now, we probably won't care much about them or even remember them.

For example, do we wear the red shirt or the blue shirt today?
3/

And some are "high value" decisions.

A year or more from now, they're likely to still be impacting us.

For example, do we get an Apple or an Android device? Do we invest this month's savings into Stock A or Stock B?
Read 31 tweets
24 Apr
1/

Get a cup of coffee.

In this thread, I'll walk you through the basics of retirement planning.
2/

Most of us go through life without ever experiencing a "windfall".

We don't start billion dollar companies.

Or win lotteries.

Or become highly paid sportsmen or movie stars.

Heck, most of us never even have a tweet go viral.
3/

For "ordinary" folks like us, the most promising path to a comfortable retirement boils down to 3 things:

a) Planning ahead and starting early,

b) Living consistently below our means (ie, saving diligently), and

c) Investing our savings intelligently over many years.
Read 30 tweets
17 Apr
1/

Get a cup of coffee.

In this thread, I'll help you understand Markov Chains.

In life, and in investing, we often come across situations where luck/chance plays a major role.

And Markov Chains are often a great way to model and analyze such situations.
2/

Here's what prompted me to write this thread.

Earlier this week, I conducted a Twitter poll.

In the poll, I posed a question that required a bit of probabilistic reasoning.

The good news: over 10,000 people responded.

The bad news: ~87% got the answer wrong!
3/

Here's the question I asked.

Imagine we have 2 volunteers: Alice and Bob.

We give them each a fair coin.

We ask Alice to keep tossing her coin until she sees a Heads immediately followed by a Tails (ie, the pattern HT).
Read 32 tweets
10 Apr
1/

Get a cup of coffee.

In this thread, I'll walk you through the basics of leverage -- in our personal lives and in the companies we invest in.
2/

Imagine we have an idea for a business.

To start the business, we need to put in $1M.

In return, the business will generate $250K for us every year -- for 10 years.

So, our upfront investment is $1M. But over the next 10 years, we get to take out $250K * 10 = $2.5M.
3/

This is an "unleveraged" annual return (IRR) of about 21.4%.

"Unleveraged" means we don't borrow any money.

That is, we use our own money for the initial $1M investment.

For more on IRRs and how to calculate them:
Read 29 tweets
3 Apr
1/

Get a cup of coffee.

In this thread, I'll tell you a story about a man and his dog.

This will help you think more clearly about volatility, risk, and the relationship between the two.
2/

This is Mr. Biswas Singh.

Friends call him "Biz".

He's 50 years old. He owns and operates several gas stations and convenience stores around town. Image
3/

This is Spock.

He's a 4 year old Golden Retriever who belongs to Biz.

He's a Good Boy. Image
Read 26 tweets
27 Mar
1/

Get a cup of coffee.

In this thread, I'll help you understand the connections between "earnings growth" and "return on capital".

This will help you analyze businesses better, and thus become a better investor.
2/

Imagine we have 2 businesses, S and F.

S is a Slow Growth business. Its earnings grow at 6% per year.

F is a (relatively) Fast Growth business. Its earnings grow at 9% per year.

Both businesses are trading at 15 times earnings.

Which is the better investment?
3/

We may be tempted to answer that F is the better investment.

After all, both S and F are trading at the same price (15 times earnings).

But with F, we get 9% growth -- compared to just 6% for S.

Sounds like a no-brainer.
Read 26 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(