10-K Diver Profile picture
14 Nov, 48 tweets, 14 min read

Get a cup of coffee.

In this thread, I'll help you understand the basics of Estimation Theory.

Life is a series of random events.

Estimation theory is the science of figuring out the *hypotheses* that best explain these events.

For example, suppose I have 2 coins: F and B.

F is a Fair coin. When you toss it, it has a 50/50 chance of coming up heads or tails.

B, on the other hand, is a Biased coin. It favors heads 60/40 over tails.

The coins look, feel, and weigh exactly the same. There's no physical test that can tell them apart.

Now, I give you one of the coins.

Your job is to try and figure out which coin I've given you.

Is it F or is it B?

Say you toss the coin 10 times.

You get heads 6 times and tails the other 4 times.

Does that automatically mean that you have the biased coin (B)?

Not necessarily. After all, the fair coin (F) can also give you 6 heads and 4 tails.

But if you think about it, the *likelihood* of B giving you 6 heads and 4 tails is higher than the likelihood of F giving you the same 6 heads and 4 tails.

B has a ~25.1% chance of giving you this outcome. F only has a ~20.5% chance of giving you this outcome.

So we have:

The Facts: 6 heads, 4 tails
Hypothesis 1: You got the biased coin B
Hypothesis 2: You got the fair coin F

Clearly, *both* hypotheses can explain the facts.

But Hypothesis 1 is a slightly *better* explanation than Hypothesis 2.

What if you tossed the coin *100* times, and got 62 heads, 38 tails? Not exactly 60/40, but close.

Now, it turns out -- you can be *even more confident* that you have the biased coin.


Because the biased coin has a *much greater probability* of giving you 62 heads and 38 tails compared to the fair coin.

The ratio of the two probabilities is about 17 to 1.

What if you tossed the coin *1000* times, and got 592 heads, 408 tails?

Now you can be *super* confident that you have the biased coin.

Its probability of giving you the observed outcome is a whopping *21.7 million* times the fair coin's probability!

So, the more *data* you gather (here, each coin toss is a data point), the more *confident* you can be that you're picking the right hypothesis.

This seems intuitive -- more data, higher confidence.

And indeed, it's an important result in estimation theory.

The strategy we've followed so far is called the Maximum Likelihood Estimator (MLE).

We start with a) some competing hypotheses, and b) an experimental outcome.

For each hypothesis, we find the probability that we'd get this outcome -- *if* the hypothesis were indeed true.

This way, each hypothesis gets a "probability score".

Now we simply select the hypothesis with the highest score.

In the MLE philosophy, that's the hypothesis that "best" explains the outcome we got.

For example, for our "6 heads and 4 tails" outcome, MLE assigned a score of ~25.1% to the "biased coin" hypothesis and ~20.5% to the the "fair coin" hypothesis.

MLE therefore chose "biased coin" as the hypothesis that best explained the observed outcome.

MLE is a pretty good heuristic. Given enough data, it usually picks a reasonable hypothesis.

But it has one important drawback.

It only looks at how likely the *outcome* is under each hypothesis.

It doesn't consider whether the *hypothesis* itself is likely or not.

So, MLE can end up choosing hypotheses that are highly unlikely to be true.

For example, when I had just 2 coins (F and B), MLE reasonably chose B for the "6 heads, 4 tails" outcome.

But suppose I had a purse of 1000 coins -- with 999 of them fair and exactly 1 biased?

Let's say, just like before, I gave you 1 coin from my purse.

And just like before, let's say you tossed it 10 times and got 6 heads and 4 tails.

Now, is it more likely that you have a fair coin, or that you have the lone biased coin?

Notice that MLE doesn't care *how* the coin got to you -- whether it was picked from a set of 2, or a set of 1000.

MLE's logic is still *exactly* the same. A biased coin is more likely to produce "6 heads, 4 tails" than a fair coin. So the biased hypothesis wins.

It doesn't matter to MLE that the biased hypothesis is a 999-to-1 shot (only 1 out of 1000 coins is biased).

MLE will happily conclude that we hit a 1-in-1000 jackpot -- based *solely* on the 6/4 outcome.

It could be a 1-in-a-quintrillion jackpot -- for all MLE cares.

That doesn't quite sound right!

Clearly, we shouldn't pick the biased hypothesis solely based on the 6/4 *outcome*.

We should also consider the 999/1 *prior* probabilities.

Enter the Bayes' Estimator (BE).

BE considers *both* -- the likelihood of the outcome under each hypothesis, *and* the likelihood of each hypothesis being true in the first place.

These likelihoods are *multiplied* together -- and that's the score each hypothesis gets.

Then the process is the same as MLE.

The hypothesis with the highest score is deemed to be the *best*.

For example, with our 999-to-1 prior probabilities favoring fair coins, BE picks "fair" over "biased" for both "6 heads, 4 tails" and "62 heads, 38 tails".

A biased coin is more likely to yield these outcomes, but the fair coins' 999-to-1 advantage wins out.

But for the "592 heads, 408 tails" outcome, BE chooses "biased" over "fair". That's because, in this case, a fair coin has such a low chance of producing the observed outcome that even with a 999-to-1 advantage, it loses to the biased coin.

Clearly, BE is superior to MLE. It takes more things into account.

Why even bother with MLE then?

Well, for one thing, BE requires knowledge (or estimates) of prior probabilities. This isn't always possible.

Most of the time, we have a coin and we want to find out if it's fair. We don't really know where the coin came from.

Second, with enough data points, MLE usually reaches the same conclusions as BE.

Large sample sizes *will* eventually overcome even heavily lopsided prior probabilities (like 999-to-1) -- as we saw with the "592 heads, 408 tails" example.

Let's do another example.

Imagine that you're Larry David -- the co-creator of Seinfeld and Curb Your Enthusiasm. (Wonderful shows if you haven't watched them!)

Recently, you noticed a couple instances where the weatherman predicted rain, but the skies were clear.

On one such occasion, you saw the weatherman playing golf.

Because he had predicted rain, everyone else had canceled their golf plans.

And so, the weatherman and his buddies had the whole course to themselves!

So, you have a sneaking suspicion that the weatherman is deliberately calling rain on sunny days -- so he can hit the empty links with his friends. This is your hypothesis.

Well, you can use estimation theory to test this hypothesis!

Let's say about 80% of the days are sunny where you live, and 20% are rainy.

And let's say *unmanipulated* weather forecasts have a 90% accuracy. That is, if a day is sunny, the forecast will predict sun 90% of the time and rain the other 10%. And vice-versa for rainy days.

So, if the weatherman is being honest, your data points will come from the distribution below.

For example, ~72% of the time, you'll get a sunny forecast and a sunny day. About 8% of the time, the forecast will call for rain but it'll actually be a sunny day. And so on.

But what if the weatherman is deliberately calling rain on (expected) sunny days -- as you suspect?

The weatherman can't call rain on *all* such days -- or he'll quickly be caught.

So let's say he calls rain on expected sunny days about 25% of the time.

Here's an Algebraic Decision Diagram showing the various possible outcomes and their probabilities -- assuming the weatherman is lying.

The numbers on each branch and in the terminal nodes are cumulative probabilities. (S_F, R_A) means (sunny forecast, rainy day) and so on.

And here's the resulting *manipulated* probability distribution.

As you can see, the percentage of days forecast to be sunny and also actually sunny drops from 72% to 54%. And so on.

So, all you need to do is keep records of what was forecast and what actually happened.

Over time, you'll be able to determine with high confidence whether the weatherman is lying.

For example, suppose you keep records for 30 days, and observe the following:

From this data, MLE predicts that the weatherman is lying (see probability calculations below) -- but not with any great degree of confidence. The ratio of probabilities is only ~1.32 to 1.

So you need more data.

Let's say you keep records for 365 days, and they look like this:

Now, it's virtually a slam dunk.

MLE can predict with very high confidence that the weatherman is lying. The ratio of probabilities is about 12.77 trillion!

That's pretty, pretty, pretty, pretty certain.

Key lesson: Think probabilistically, and be willing to challenge your own hypotheses.

When new data arrives, ask yourself: if my opinion is right, what are the chances of this outcome? And if I'm wrong, what are the chances then?

That's the essence of estimation theory.

This is especially relevant to investing and analyzing companies.

Often, companies don't report the numbers we really want -- owner earnings, growth vs maintenance capex, unit economics, etc.

We have to tease out these *unreported* numbers from the reported numbers.

One way out may be to entertain plausible *guesses* regarding the unreported numbers (including their prior probabilities) -- and then try to use the tools of estimation theory (MLE, BE, etc.) to zero in on the guesses that best explain the reported numbers.

So: given some hypotheses and some data, estimation theory figures out which hypothesis is most likely to be true.

A less quantitative rule is Occam's Razor: it says that the simplest hypothesis (perhaps the one that makes the fewest assumptions) is usually the right one.

But I prefer to say that the *most likely* hypothesis is most likely to be the right one. That's a tautology right there!

For more on Occam's Razor, here's a thread written by my friend @SahilBloom:

I also recommend watching this superb ~16 min video by @3blue1brown about Bayes' Rule.

Bayes' Rule is a fundamental cornerstone of probability -- and it's where the Bayes' Estimator (BE) in this thread comes from.

If you're still with me, I salute you. Your enthusiasm cannot be curbed!

Thanks for reading. Enjoy your weekend. Happy Diwali!


• • •

Missing some Tweet in this thread? You can try to force a refresh

Keep Current with 10-K Diver

10-K Diver Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!


Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @10kdiver

7 Nov

Get a cup of coffee.

In this thread, I'll walk you through tail risks and their dangers.

Imagine that you've just started an insurance company -- XYZ Corp.

XYZ is based in Florida, and it sells auto insurance.

By relentlessly pestering your family members and friends, you manage to convince 100 people to switch to XYZ from their current insurance provider.

At the start of each year, your customers pay you an annual premium.

In exchange for this premium, you take on some risk.

If a customer's car is damaged during the year -- for example, due to an accident -- you have to pay to fix it.
Read 39 tweets
31 Oct

Get a cup of coffee.

In this thread, I'll help you understand Same Store Sales (SSS), and why it's such an important metric for retailers.

SSS comes into play whenever we have a company whose business is spread out across several different locations.

Think store chains -- Home Depot, Costco, Target, Walmart, Dollar General, etc.

Or restaurant chains -- Starbucks, Chipotle, etc.

So, what exactly is SSS?

In simple terms, it's just a comparison.

We *compare* the sales made by a bunch of stores this year -- versus the sales made by the same (or a similar) bunch of stores last year.
Read 33 tweets
24 Oct

Get a cup of coffee.

In this thread, I'll help you understand the basics of depreciation.

Imagine that you're an electronics hobbyist.

You love tinkering with gadgets -- taking them apart, figuring out how they work, thinking of ways to improve them, etc.

You've converted your garage into a lab of sorts, where you spend endless hours playing with your toys.

From a young age, you've had a fascination for batteries and battery technology.

You're amazed by the progress we've made.

Today, we have batteries that can power a full-sized car for hundreds of miles on a single charge.

But you know there's still a long way to go!
Read 39 tweets
23 Oct

In finance and investing, it pays to be a little paranoid.

To develop a strong "survival instinct".

To *always* be prepared for adversity -- even when things are going well.

Here's a story about Warren Buffett's grandfather that drives this home.

Warren's grandfather -- Ernest Buffett -- had a simple business.

He owned and operated a grocery store in Omaha.

Both Warren and Charlie worked for Ernest when they were young (though not at the same time). They both came away impressed.

Ernest never finished high school.

But he intuitively understood financial risk -- both in business and in personal life.

Early on, he realized the importance of a rainy day fund -- a stash of money that's set aside for emergencies.
Read 10 tweets
17 Oct

Get a cup of coffee.

In this thread, I'll help you understand the pitfalls of relying too much on averages.

We humans are simple creatures.

We like to boil down complex situations into a single number.

For example, the economy is complex. So we reduce it to one number -- GDP.

Valuing a company is complex. So we reduce it to one number -- a P/E ratio.

And so on.

Averages are the same way.

Studying a population is complex. So we tend to look for one number that can represent the whole population.

This number -- more often than not -- ends up being the population's "average".
Read 39 tweets
10 Oct

Get a cup of coffee.

In this thread, I'll help you understand the basics of probability distributions and random variables.

Imagine that you run an insurance company.

Every year, you write a policy that insures the city of San Francisco against an earthquake.

Every year, on Jan 1'st, the city pays you an "insurance premium". You can invest this premium however you like.

If an earthquake hits, you need to pay the city $1B on Dec 31'st.

If no earthquake hits, you get to keep the premium -- and any returns you made by investing it.
Read 34 tweets

Did Thread Reader help you today?

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!