You can explain the Bayes formula in pure English.

Despite being overloaded with seemingly complex concepts, it conveys an important lesson about how observations change our beliefs about the world.

Let's take it apart!

↓ A thread. ↓
Essentially, the Bayes formula describes how to update our models, given new information.

To understand why, we will look at a simple example with a twist: tossing a biased coin.
Suppose that we have a magical coin!

When tossed, it can come up with heads or tails, but not necessarily with equal chance.

The catch is, we don't know the exact probabilities. So, we have to perform some experiments to find that out.
To mathematically formulate the problem, we denote the probability of heads with 𝑥.

What do we know about 𝑥?

At this point, nothing. It can be any number between 0 and 1.
Instead of looking at 𝑥 as a fixed number, let's think about it as an observation of the experiment 𝑋.

To model our (lack of) knowledge about 𝑋, we assume that each value is equally probable.

This distribution is called the prior.
(Note that we are working with probability density functions, not distributions.

The values of the density function are not probabilities, despite the notation!)
So, suppose that we tossed our magical coin, and it landed on tails.

How does it influence our model?

We can tell that if the probability of heads is some 𝑥, then the likelihood of our experiment resulting in tails is 1-𝑥.
However, we want to know the probability distribution with the condition and the event in the other way around: we are curious about our probabilistic model of the parameter, given the result of our previous experiment.

This is called the posterior distribution.
Now let's put everything together!

The Bayes formula is exactly what we need, as it expresses the posterior in terms of the prior and the likelihood.
It might be surprising, but the true probability of the experiment resulting in tails is irrelevant.

Why? Because it is independent of 𝑋. Also, the integral of the posterior evaluates to 1.

Here it is 0.5, but this can be hard to evaluate analytically in the general case.
So, we have our posterior. Notice that it is more concentrated around 𝑥 = 0. (Recall that 𝑥 is the probability of heads.)

This means that if we only saw a single coin toss and it resulted in tails, we guess that the coin is biased towards that.
Of course, we can do more and more coin tosses to refine the posterior further.

(After 𝑘 heads and 𝑛 - 𝑘 tails, the posterior will be the so-called Beta distribution, more info here: en.wikipedia.org/wiki/Beta_dist…)
To summarize, here is the Bayes formula in pure English. (Well, sort of.)

posterior ∝ likelihood x prior

Or, in other words, the Bayes formula just describes how to update our models given new information!
Having a deep understanding of math will make you a better engineer. I want to help you with this, so I am writing a comprehensive book about the subject.

If you are interested in the details and beauties of mathematics, check out the early access!

tivadardanka.com/book/

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

27 Dec 21
Entropy is not the easiest thing to understand.

It is rumored to describe something about information and disorder, but it is unclear why.

What do logarithms and sums have to do with the concept of information?

Let me explain!

↓ A thread. ↓
I have randomly selected an integer between 0 and 31.

Can you guess which one? You can ask as many questions as you want.

What is the minimum number of questions you have to ask to be 100% sure?

You can start guessing the numbers one by one, sure. But there is a better way!
If you ask, "is the number larger or equal than 16?" you immediately eliminate half the search space!

Continuing with this tactic, you can find the number for sure in 5 questions.
Read 15 tweets
16 Dec 21
Why is matrix multiplication defined the way it is?

When I first learned about it, the formula seemed too complicated and counter-intuitive! I wondered, why not just multiply elements at the same position together?

Let me explain why!

↓ A thread. ↓

1/11
First, let's see how to make sense of matrix multiplication!

The elements of the product are calculated by multiplying rows of 𝐴 with columns of 𝐵.

It is not trivial at all why this is the way. 🤔

To understand, let's talk about what matrices really are!

2/11
Matrices are just representations of linear transformations: mappings between vector spaces that are interchangeable with addition and scalar multiplication.

Let's dig a bit deeper to see why are matrices and linear transformations are (almost) the same!

3/11
Read 11 tweets
15 Dec 21
Expected value is one of the most fundamental concepts in probability theory and machine learning.

Have you ever wondered what it really means and where it comes from?

The formula doesn't tell the entire story right away.

💡 Let's unravel what is behind the scenes! 💡
First, let's take a look at a simple example.

Suppose that we are playing a game. You toss a coin, and

• if it comes up heads, you win $1,
• but if it is tails, you lose $2.

Should you even play this game with me? 🤔

We are about to find out!
After 𝑛 rounds, your earnings can be calculated by the number of heads times $1 minus the number of tails times $2.

If we divide total earnings by 𝑛, we obtain the average earnings per round.

What happens if 𝑛 approaches infinity? 🤔
Read 9 tweets
9 Dec 21
Just released a new chapter in the early access of my Mathematics of Machine Learning book!

It is about computing determinants in practice. Sadly, this is often missing from linear algebra courses, so I decided to fill this gap.

↓ Here's the gist. ↓
The determinant of a matrix is essentially the product of

• the orientation of its column vectors (which is either 1 or -1),
• and the area of the parallelepiped determined by them.

For 2x2 matrices, this is illustrated below.
Here is the thing.

In mathematics, we generally use two formulas to compute this quantity.

First, we have a sum that runs through all permutations of the columns.

This formula is hard to understand, let alone to implement.
Read 10 tweets
8 Dec 21
Math Twitter!

I have an annoyingly simple problem that has been bugging me for years. It is about the 3-regular infinite tree graph with a root.

Can we collaboratively solve this problem? I'll explain below.

(Retweet so this reaches as many smart people as possible.) Image
The 3-regular infinite tree with a root (3RT) is very simple to define.

The root vertex has two children, and besides that, every vertex has two more. This goes on infinitely.

You get the pattern. This is illustrated below. Image
It is easy to see that 3RT is a planar graph; that is, you can draw it on the plane without any edges intersecting.

I am particularly interested in drawing the 3RT inside a bounded set of the plane without any edges intersecting.
Read 7 tweets
7 Dec 21
I was recently invited to join the Underfitted DAO.

By default, I am skeptical about crypto projects, so I had a LOT of questions.

After a long conversation with @haltakov and @svpino, I am convinced.

Here is why.

The best way to get experience and knowledge in machine learning is to build things.

If you build things, you should be compensated in proportion to the value you create.

In practice, this is often not the case. Not in open source, not in the industry.

What can change this?
Well, DAO-s have this potential. Of course, the concept may not live up to the expectations, but their fundamental idea holds a lot of promise.

In a DAO, contributions are rewarded with tokens. Decisions are made by voting, where all token-holders can participate.
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(