Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Tivadar Danka

@TivadarDanka

Jul 18, 2025 • 33 tweets • 9 min read • Read on X

Scrolly

The single biggest argument about statistics: is probability frequentist or Bayesian?

It's neither, and I'll explain why.

Buckle up. Deep-dive explanation incoming.

First, let's look at what is probability.

Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.

In the language of probability theory, the events are formalized by sets within an event space.

The event space is also a set, usually denoted by Ω.)

The union and intersection of sets can be translated into the language of events.

The intersection of two events expresses an outcome where both events happen simultaneously. (For instance, a dice roll can be both less than 4 and an odd number.)

We can also take the complement of an event, expressing when it does NOT happen.

So, probability is a function that takes a set and returns a number between 0 and 1.

There are two fundamental properties we expect from probability. First, the probability of the entire event space must be 1.

Second, that the probability of mutually exclusive events is the sum of their probabilities.

Intuitively, this is clear.

In fact, this is true for any countable collection of mutually exclusive events.

These two properties can be used to define probability!

Mathematically speaking, any measure that satisfies these two axioms is a probability measure.

These are called Kolmogorov’s axioms. Every result in probability theory and statistics is a consequence of them.

Let's see some probabilistic models!

1. Tossing a fair coin. This is the simplest possible example. There are two possible outcomes, both having the same probability.

2. Throwing darts. Suppose that we are throwing darts at a large wall in front of us, which is our event space. (We'll always hit the wall.)

If we throw the dart randomly, the probability of hitting a certain shape is proportional to the shape's area.

Note that at this point, there is no frequentist or a Bayesian interpretation yet!

Probability is a well-defined mathematical object. This concept is separated from how probabilities are assigned.

Now comes the part that has been fueling debates for decades.

How can we assign probabilities? There are (at least) two schools of thought, constantly in conflict with each other.

Let's start with the frequentist school.

Suppose that we repeatedly perform a single experiment, counting the number of occurrences of the possible events. Say, we are tossing a coin and count the number of times it turns up heads.

The ratio of the heads and the tosses is called “the relative frequency of heads”.

As the number of observations grows, the relative frequency will converge to the true probability.

This is not an interpretation of probability. This is a mathematically provable fact, independent of interpretations. (A special case of the famous Law of Large Numbers.)

Frequentists leverage this to build probabilistic models. For example, if we toss a coin n times and heads come up exactly k times, then the probability of heads is estimated to be k/n.

On the other hand, the Bayesian school argues that such estimations are wrong, because probabilities are not absolute, but a measure of our current beliefs.

This is way too abstract, so let's elaborate.

In probabilistic models, observing certain events can influence our beliefs about others. For instance, if the sky is clear, the probability of rain goes down. If it’s cloudy, the same probability goes up.

This is expressed in terms of conditional probabilities.

With conditional probabilities, we can quantify our intuition about the relation of rain and the clouds in the sky.

Conditional probabilities allow us to update our probabilistic model in light of new information. This is called the Bayes formula, hence the terminology "Bayesian statistics".

Again, this is a mathematically provable fact, not an interpretation.

Let's stick to our coin-tossing example to show how this works in practice. Regardless of the actual probabilities, 90 heads from 100 tosses is a possible outcome in (almost) every case.

Is the coin biased, or were we just lucky? How can we tell?

In Bayesian statistics, we treat our probability-to-be-estimated as a random variable. Thus, we are working with probability distributions or densities.

Yes, I know. The probability of probability. It’s kind of an Inception-moment, but you’ll get used to it.

Our prior assumption about the probability is called, well, the prior.

For instance, if we know absolutely nothing about our coin, we assume this to be uniform.

What we want is to include the experimental observations in our estimation, which is expressed in terms of conditional probabilities.

This is called posterior estimation.

The Bayes formula connects the prior and the likelihood to the posterior.

Don't worry if this seems complex! We'll unravel it term by term.

There are three terms on the right side: the likelihood, the prior, and the evidence.

In pure English,

• the likelihood describes the probability of the observation given the model parameter,
• the prior describes our assumptions about the parameter before the observation,
• and the evidence is the total probability of our observation.

Bad news: the evidence can be impossible to evaluate. Good news: we don’t have to! We find the parameter estimate by maximizing the posterior, and as the evidence doesn’t depend on the parameter at all, we can simply omit it.

Back to our coin-tossing example. Given the probability of heads, the likelihood can be computed using simple combinatorics.

After this, we get a concrete formula for the posterior density.

(The symbol ∝ reads as “proportional to”, and we write this instead of equality because of the omitted denominator.)

To sum up: as a mathematical concept, probability is independent of interpretation. The question of frequentist vs. Bayesian comes up when we are building probabilistic models from data.

Is the Bayesian viewpoint better than the frequentist one?

No. It's just different. In certain situations, frequentist estimations are perfectly enough. In others, Bayesian methods have the advantage. Use the right tool for the task, and don't worry about the rest.

If you liked this thread, you will love The Palindrome, my weekly newsletter on Mathematics and Machine Learning.

Join 19,000+ curious readers here: thepalindrome.org

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @TivadarDanka

Tivadar Danka

@TivadarDanka

Jan 20

The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.

Encoding matrices as graphs is a cheat code, making complex behavior simple to study.

Let me show you how!

If you looked at the example above, you probably figured out the rule.

Each row is a node, and each element represents a directed and weighted edge. Edges of zero elements are omitted.

The element in the 𝑖-th row and 𝑗-th column corresponds to an edge going from 𝑖 to 𝑗.

To unwrap the definition a bit, let's check the first row, which corresponds to the edges outgoing from the first node.

Read 18 tweets

Tivadar Danka

@TivadarDanka

Jan 14

Matrix multiplication is not easy to understand.

Even looking at the definition used to make me sweat, let alone trying to comprehend the pattern. Yet, there is a stunningly simple explanation behind it.

Let's pull back the curtain!

First, the raw definition.

This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.

We are going to unwrap this.

Here is a quick visualization before the technical details.

The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.

Read 17 tweets

Tivadar Danka

@TivadarDanka

Jan 8

Behold one of the mightiest tools in mathematics: the camel principle.

I am dead serious. Deep down, this tiny rule is the cog in many methods. Ones that you use every day.

Here is what it is, how it works, and why it is essential:

First, the story:

The old Arab passes away, leaving half of his fortune to his eldest son, third to his middle son, and ninth to his smallest.

Upon opening the stable, they realize that the old man had 17 camels.

This is a problem, as they cannot split 17 camels into 1/2, 1/3, and 1/9 without cutting some in half.

So, they turn to the wise neighbor for advice.

Read 18 tweets

Tivadar Danka

@TivadarDanka

Jan 1

To unwrap the definition a bit, let's check the first row, which corresponds to the edges outgoing from the first node.

Read 18 tweets

Tivadar Danka

@TivadarDanka

Dec 11, 2025

To unwrap the definition a bit, let's check the first row, which corresponds to the edges outgoing from the first node.

Read 18 tweets

Tivadar Danka

@TivadarDanka

Dec 9, 2025

First, the raw definition.

This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.

We are going to unwrap this.

Here is a quick visualization before the technical details.

The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.

Read 17 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Tivadar Danka

Try unrolling a thread yourself!

More from @TivadarDanka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!