Tivadar Danka Profile picture
Jul 18, 2025 33 tweets 9 min read Read on X
The single biggest argument about statistics: is probability frequentist or Bayesian?

It's neither, and I'll explain why.

Buckle up. Deep-dive explanation incoming. Image
First, let's look at what is probability.

Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone. Image
In the language of probability theory, the events are formalized by sets within an event space.

The event space is also a set, usually denoted by Ω.) Image
The union and intersection of sets can be translated into the language of events.

The intersection of two events expresses an outcome where both events happen simultaneously. (For instance, a dice roll can be both less than 4 and an odd number.) Image
We can also take the complement of an event, expressing when it does NOT happen. Image
So, probability is a function that takes a set and returns a number between 0 and 1.

There are two fundamental properties we expect from probability. First, the probability of the entire event space must be 1. Image
Second, that the probability of mutually exclusive events is the sum of their probabilities.

Intuitively, this is clear. Image
In fact, this is true for any countable collection of mutually exclusive events. Image
These two properties can be used to define probability!

Mathematically speaking, any measure that satisfies these two axioms is a probability measure.

These are called Kolmogorov’s axioms. Every result in probability theory and statistics is a consequence of them. Image
Let's see some probabilistic models!

1. Tossing a fair coin. This is the simplest possible example. There are two possible outcomes, both having the same probability. Image
2. Throwing darts. Suppose that we are throwing darts at a large wall in front of us, which is our event space. (We'll always hit the wall.)

If we throw the dart randomly, the probability of hitting a certain shape is proportional to the shape's area. Image
Note that at this point, there is no frequentist or a Bayesian interpretation yet!

Probability is a well-defined mathematical object. This concept is separated from how probabilities are assigned.
Now comes the part that has been fueling debates for decades.

How can we assign probabilities? There are (at least) two schools of thought, constantly in conflict with each other.

Let's start with the frequentist school.
Suppose that we repeatedly perform a single experiment, counting the number of occurrences of the possible events. Say, we are tossing a coin and count the number of times it turns up heads.

The ratio of the heads and the tosses is called “the relative frequency of heads”. Image
As the number of observations grows, the relative frequency will converge to the true probability.

This is not an interpretation of probability. This is a mathematically provable fact, independent of interpretations. (A special case of the famous Law of Large Numbers.) Image
Frequentists leverage this to build probabilistic models. For example, if we toss a coin n times and heads come up exactly k times, then the probability of heads is estimated to be k/n. Image
On the other hand, the Bayesian school argues that such estimations are wrong, because probabilities are not absolute, but a measure of our current beliefs.

This is way too abstract, so let's elaborate.
In probabilistic models, observing certain events can influence our beliefs about others. For instance, if the sky is clear, the probability of rain goes down. If it’s cloudy, the same probability goes up.

This is expressed in terms of conditional probabilities. Image
With conditional probabilities, we can quantify our intuition about the relation of rain and the clouds in the sky. Image
Conditional probabilities allow us to update our probabilistic model in light of new information. This is called the Bayes formula, hence the terminology "Bayesian statistics".

Again, this is a mathematically provable fact, not an interpretation. Image
Let's stick to our coin-tossing example to show how this works in practice. Regardless of the actual probabilities, 90 heads from 100 tosses is a possible outcome in (almost) every case.

Is the coin biased, or were we just lucky? How can we tell?
In Bayesian statistics, we treat our probability-to-be-estimated as a random variable. Thus, we are working with probability distributions or densities.

Yes, I know. The probability of probability. It’s kind of an Inception-moment, but you’ll get used to it. Image
Our prior assumption about the probability is called, well, the prior.

For instance, if we know absolutely nothing about our coin, we assume this to be uniform. Image
What we want is to include the experimental observations in our estimation, which is expressed in terms of conditional probabilities.

This is called posterior estimation. Image
The Bayes formula connects the prior and the likelihood to the posterior. Image
Don't worry if this seems complex! We'll unravel it term by term.

There are three terms on the right side: the likelihood, the prior, and the evidence. Image
In pure English,

• the likelihood describes the probability of the observation given the model parameter,
• the prior describes our assumptions about the parameter before the observation,
• and the evidence is the total probability of our observation.
Bad news: the evidence can be impossible to evaluate. Good news: we don’t have to! We find the parameter estimate by maximizing the posterior, and as the evidence doesn’t depend on the parameter at all, we can simply omit it.
Back to our coin-tossing example. Given the probability of heads, the likelihood can be computed using simple combinatorics. Image
After this, we get a concrete formula for the posterior density.

(The symbol ∝ reads as “proportional to”, and we write this instead of equality because of the omitted denominator.)
To sum up: as a mathematical concept, probability is independent of interpretation. The question of frequentist vs. Bayesian comes up when we are building probabilistic models from data.
Is the Bayesian viewpoint better than the frequentist one?

No. It's just different. In certain situations, frequentist estimations are perfectly enough. In others, Bayesian methods have the advantage. Use the right tool for the task, and don't worry about the rest.
If you liked this thread, you will love The Palindrome, my weekly newsletter on Mathematics and Machine Learning.

Join 19,000+ curious readers here: thepalindrome.org

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

Jan 20
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! Image
If you looked at the example above, you probably figured out the rule.

Each row is a node, and each element represents a directed and weighted edge. Edges of zero elements are omitted.

The element in the 𝑖-th row and 𝑗-th column corresponds to an edge going from 𝑖 to 𝑗.
To unwrap the definition a bit, let's check the first row, which corresponds to the edges outgoing from the first node. Image
Read 18 tweets
Jan 14
Matrix multiplication is not easy to understand.

Even looking at the definition used to make me sweat, let alone trying to comprehend the pattern. Yet, there is a stunningly simple explanation behind it.

Let's pull back the curtain! Image
First, the raw definition.

This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.

We are going to unwrap this. Image
Here is a quick visualization before the technical details.

The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column. Image
Read 17 tweets
Jan 8
Behold one of the mightiest tools in mathematics: the camel principle.

I am dead serious. Deep down, this tiny rule is the cog in many methods. Ones that you use every day.

Here is what it is, how it works, and why it is essential: Image
First, the story:

The old Arab passes away, leaving half of his fortune to his eldest son, third to his middle son, and ninth to his smallest.

Upon opening the stable, they realize that the old man had 17 camels. Image
This is a problem, as they cannot split 17 camels into 1/2, 1/3, and 1/9 without cutting some in half.

So, they turn to the wise neighbor for advice. Image
Read 18 tweets
Jan 1
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! Image
If you looked at the example above, you probably figured out the rule.

Each row is a node, and each element represents a directed and weighted edge. Edges of zero elements are omitted.

The element in the 𝑖-th row and 𝑗-th column corresponds to an edge going from 𝑖 to 𝑗.
To unwrap the definition a bit, let's check the first row, which corresponds to the edges outgoing from the first node. Image
Read 18 tweets
Dec 11, 2025
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how! Image
If you looked at the example above, you probably figured out the rule.

Each row is a node, and each element represents a directed and weighted edge. Edges of zero elements are omitted.

The element in the 𝑖-th row and 𝑗-th column corresponds to an edge going from 𝑖 to 𝑗.
To unwrap the definition a bit, let's check the first row, which corresponds to the edges outgoing from the first node. Image
Read 18 tweets
Dec 9, 2025
Matrix multiplication is not easy to understand.

Even looking at the definition used to make me sweat, let alone trying to comprehend the pattern. Yet, there is a stunningly simple explanation behind it.

Let's pull back the curtain! Image
First, the raw definition.

This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.

We are going to unwrap this. Image
Here is a quick visualization before the technical details.

The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column. Image
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(