Tivadar Danka Profile picture
Jul 18 33 tweets 9 min read Read on X
The single biggest argument about statistics: is probability frequentist or Bayesian?

It's neither, and I'll explain why.

Buckle up. Deep-dive explanation incoming. Image
First, let's look at what is probability.

Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone. Image
In the language of probability theory, the events are formalized by sets within an event space.

The event space is also a set, usually denoted by Ω.) Image
The union and intersection of sets can be translated into the language of events.

The intersection of two events expresses an outcome where both events happen simultaneously. (For instance, a dice roll can be both less than 4 and an odd number.) Image
We can also take the complement of an event, expressing when it does NOT happen. Image
So, probability is a function that takes a set and returns a number between 0 and 1.

There are two fundamental properties we expect from probability. First, the probability of the entire event space must be 1. Image
Second, that the probability of mutually exclusive events is the sum of their probabilities.

Intuitively, this is clear. Image
In fact, this is true for any countable collection of mutually exclusive events. Image
These two properties can be used to define probability!

Mathematically speaking, any measure that satisfies these two axioms is a probability measure.

These are called Kolmogorov’s axioms. Every result in probability theory and statistics is a consequence of them. Image
Let's see some probabilistic models!

1. Tossing a fair coin. This is the simplest possible example. There are two possible outcomes, both having the same probability. Image
2. Throwing darts. Suppose that we are throwing darts at a large wall in front of us, which is our event space. (We'll always hit the wall.)

If we throw the dart randomly, the probability of hitting a certain shape is proportional to the shape's area. Image
Note that at this point, there is no frequentist or a Bayesian interpretation yet!

Probability is a well-defined mathematical object. This concept is separated from how probabilities are assigned.
Now comes the part that has been fueling debates for decades.

How can we assign probabilities? There are (at least) two schools of thought, constantly in conflict with each other.

Let's start with the frequentist school.
Suppose that we repeatedly perform a single experiment, counting the number of occurrences of the possible events. Say, we are tossing a coin and count the number of times it turns up heads.

The ratio of the heads and the tosses is called “the relative frequency of heads”. Image
As the number of observations grows, the relative frequency will converge to the true probability.

This is not an interpretation of probability. This is a mathematically provable fact, independent of interpretations. (A special case of the famous Law of Large Numbers.) Image
Frequentists leverage this to build probabilistic models. For example, if we toss a coin n times and heads come up exactly k times, then the probability of heads is estimated to be k/n. Image
On the other hand, the Bayesian school argues that such estimations are wrong, because probabilities are not absolute, but a measure of our current beliefs.

This is way too abstract, so let's elaborate.
In probabilistic models, observing certain events can influence our beliefs about others. For instance, if the sky is clear, the probability of rain goes down. If it’s cloudy, the same probability goes up.

This is expressed in terms of conditional probabilities. Image
With conditional probabilities, we can quantify our intuition about the relation of rain and the clouds in the sky. Image
Conditional probabilities allow us to update our probabilistic model in light of new information. This is called the Bayes formula, hence the terminology "Bayesian statistics".

Again, this is a mathematically provable fact, not an interpretation. Image
Let's stick to our coin-tossing example to show how this works in practice. Regardless of the actual probabilities, 90 heads from 100 tosses is a possible outcome in (almost) every case.

Is the coin biased, or were we just lucky? How can we tell?
In Bayesian statistics, we treat our probability-to-be-estimated as a random variable. Thus, we are working with probability distributions or densities.

Yes, I know. The probability of probability. It’s kind of an Inception-moment, but you’ll get used to it. Image
Our prior assumption about the probability is called, well, the prior.

For instance, if we know absolutely nothing about our coin, we assume this to be uniform. Image
What we want is to include the experimental observations in our estimation, which is expressed in terms of conditional probabilities.

This is called posterior estimation. Image
The Bayes formula connects the prior and the likelihood to the posterior. Image
Don't worry if this seems complex! We'll unravel it term by term.

There are three terms on the right side: the likelihood, the prior, and the evidence. Image
In pure English,

• the likelihood describes the probability of the observation given the model parameter,
• the prior describes our assumptions about the parameter before the observation,
• and the evidence is the total probability of our observation.
Bad news: the evidence can be impossible to evaluate. Good news: we don’t have to! We find the parameter estimate by maximizing the posterior, and as the evidence doesn’t depend on the parameter at all, we can simply omit it.
Back to our coin-tossing example. Given the probability of heads, the likelihood can be computed using simple combinatorics. Image
After this, we get a concrete formula for the posterior density.

(The symbol ∝ reads as “proportional to”, and we write this instead of equality because of the omitted denominator.)
To sum up: as a mathematical concept, probability is independent of interpretation. The question of frequentist vs. Bayesian comes up when we are building probabilistic models from data.
Is the Bayesian viewpoint better than the frequentist one?

No. It's just different. In certain situations, frequentist estimations are perfectly enough. In others, Bayesian methods have the advantage. Use the right tool for the task, and don't worry about the rest.
If you liked this thread, you will love The Palindrome, my weekly newsletter on Mathematics and Machine Learning.

Join 19,000+ curious readers here: thepalindrome.org

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

Jul 19
A question we never ask:

"How large is that number in the Law of Large Numbers?"

Sometimes, a thousand samples are large enough. Sometimes, even ten million samples fall short.

How do we know? I'll explain. Image
First things first: the law of large numbers (LLN).

Roughly speaking, it states that the averages of independent, identically distributed samples converge to the expected value, given that the number of samples grows to infinity.

We are going to dig deeper. Image
There are two kinds of LLN-s: weak and strong.

The weak law makes a probabilistic statement about the sample averages: it implies that the probability of "the sample average falling farther from the expected value than ε" goes to zero for any ε.

Let's unpack this. Image
Read 15 tweets
Jul 16
You have probably seen the famous bell curve hundreds of times before.

It is often referred to as some sort of “probability”. Contrary to popular belief, this is NOT a probability, but a probability density.

What are densities and why do we need them? Image
First, let's talk about probability.

The gist is, probability is a function P(A) that takes an event (that is, a set), and returns a real number between 0 and 1.

The event is a subset of the so-called sample space, a set often denoted with the capital Greek omega (Ω). Image
Every probability measure must satisfy three conditions: nonnegativity, additivity, and the probability of the entire sample space must be 1.

These are called the Kolmogorov axioms of probability, named after Andrey Kolmogorov, who first formalized them. Image
Read 21 tweets
Jul 15
If it is raining, the sidewalk is wet.

If the sidewalk is wet, is it raining? Not necessarily. Yet, we are inclined to think so. This is a preposterously common logical fallacy called "affirming the consequent".

However, it is not totally wrong. Why? Enter the Bayes theorem. Image
Propositions of the form "if A, then B" are called implications.

They are written as "A → B", and they form the bulk of our scientific knowledge.

Say, "if X is a closed system, then the entropy of X cannot decrease" is the 2nd law of thermodynamics.
In the implication A → B, the proposition A is called "premise", while B is called the "conclusion".

The premise implies the conclusion, but not the other way around.

If you observe a wet sidewalk, it is not necessarily raining. Someone might have spilled a barrel of water.
Read 9 tweets
Jul 14
"Probability is the logic of science."

There is a deep truth behind this conventional wisdom: probability is the mathematical extension of logic, augmenting our reasoning toolkit with the concept of uncertainty.

In-depth exploration of probabilistic thinking incoming. Image
Our journey ahead has three stops:

1. an introduction to mathematical logic,
2. a touch of elementary set theory,
3. and finally, understanding probabilistic thinking.

First things first: mathematical logic.
In logic, we work with propositions.

A proposition is a statement that is either true or false, like
• "it's raining outside",
• or "the sidewalk is wet".

These are often abbreviated as variables, such as A = "it's raining outside".
Read 28 tweets
Jul 13
Conditional probability is the single most important concept in statistics.

Why? Because without accounting for prior information, predictive models are useless.

Here is what conditional probability is, and why it is essential. Image
Conditional probability allows us to update our models by incorporating new observations.

By definition, P(B | A) describes the probability of an event B, given that A has occurred. Image
Here is an example. Suppose that among 100 emails, 30 are spam.

Based only on this information, if we inspect a random email, our best guess is a 30% chance of it being a spam.

This is not good enough. Image
Read 10 tweets
Jul 11
Most people think math is just numbers.

But after 20 years with it, I see it more like a mirror.

Here are 10 surprising lessons math taught me about life, work, and thinking clearly: Image
1. Breaking the rules is often the best course of action.

We have set theory because Bertrand Russell broke the notion that “sets are just collections of things.”
2. You have to understand the rules to successfully break them.

Miles Davis said, “Once is a mistake, twice is jazz.”

Mistakes are easy to make. Jazz is hard.
Read 12 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(