Tweet

Tivadar Danka

22 Feb, 14 tweets, 5 min read

You can explain the Bayes formula in pure English.

Even without using any mathematical terminology.

Despite being overloaded with seemingly complex concepts, it conveys an important lesson about how observations change our beliefs about the world.

Let's take it apart!

Essentially, the Bayes formula describes how to update our models, given new information.

To understand why, we will look at a simple example with a twist: coin tossing with an unfair coin.

Let's suppose that we have a magical coin! It can come up with heads or tails when tossed, but not necessarily with equal probability.

The catch is, we don't know the exact probability. So, we have to perform some experiments and statistical estimation to find that out.

To mathematically formulate the problem, we denote the probability of heads with 𝑥.

What do we know about 𝑥? 🤔

At this point, nothing. It can be any number between 0 and 1.

Instead of looking at 𝑥 as a fixed number, let's think about it as an observation of the experiment 𝑋.

To model our (lack of) knowledge about 𝑋, we select the uniform distribution on [0, 1].

This is called the 𝑝𝑟𝑖𝑜𝑟, as it expresses our knowledge before the experiment.

So, suppose that we have tossed our magical coin up and it landed on tails.

How does it influence our model about the coin? 🤔

What we can tell is that if the probability of heads is some 𝑥, then the 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 of our experiment resulting in tails is 1-𝑥.

Notice that we want to know the probability distribution with the condition and the event in the other way around: we are curious about our probabilistic model of the parameter, given the result of our previous experiment.

This is called the 𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 distribution.

Now let's put everything together!

The Bayes formula is exactly what we need, as it expresses the posterior in terms of the prior and the likelihood.

Might be surprising, but the true probability of the experiment resulting in tails is irrelevant. 🤔

Why? Because it is independent of 𝑋. Also, the integral of the posterior evaluates to 1.

Here it is 0.5, but in the general case, this can be hard to evaluate analytically.

So, we have our posterior. Notice that it is more concentrated around 𝑥 = 0. (Recall that 𝑥 is the probability of heads.)

This means that if we only saw a single coin toss and it resulted in tails, our guess is that the coin is biased towards that.

Of course, we can do more and more coin tosses, which can be used to refine the posterior even further.

(After 𝑘 heads and 𝑛 - 𝑘 tails, the posterior will be the so-called Beta distribution, more info here: en.wikipedia.org/wiki/Beta_dist…)

To summarize, here is the Bayes formula in pure English. (Well, sort of.)

posterior ∝ likelihood times prior

Or, in other words, the Bayes formula describes how to update our models, given new information

@svpino

Bonus!

One of the most surprising applications of the Bayes formula is in connection with medical tests.

Check out this recent thread by @svpino!

https://twitter.com/svpino/status/1362354925548810241

https://twitter.com/TivadarDanka/status/1361699203790106625

More bonus!

It might be surprising, but Bayes theorem also plays a part in the definition of Mean Square Error.

I did a short explanation of this in a recent thread, you can find it here:

https://twitter.com/TivadarDanka/status/1361699203790106625

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @TivadarDanka

Tivadar Danka

@TivadarDanka

23 Feb

What makes it possible to train neural networks with gradient descent?

The fact that the loss function of a network is a differentiable function!

Differentiation can be hard to understand. However, it is an intuitive concept from physics.

💡 Let's see what it really is! 💡

Differentiation essentially describes a function's rate of change.

Let's see how!

Suppose that we have a tiny object moving along a straight line back and forth.

Its movement is fully described by its distance from the starting point, plotted against the time.

What is its average speed in its 10 seconds of travel time?

The average speed is simply defined as the ratio of distance and time.

However, it doesn't really describe the entire movement. As you can see, the speed is sometimes negative, sometimes positive.

Read 11 tweets

Tivadar Danka

@TivadarDanka

21 Feb

It is the weekend now, so let's talk about something different, but still awesome and beautiful!

This image has been my desktop wallpaper for years.

Can you guess what is it?

This machine represents one of the most brilliant ideas I have seen. (Answer in the next tweet.)

This is the Wankel engine, a surprisingly innovative type of internal combustion engines.

Why is it so brilliant? In short, because it parallelizes the classical four-stage Otto cycle, all in one chamber!

To elaborate a bit, let's see how a four-stroke piston engine works!

The common four-stroke piston engine essentially has four stages:

1. Intake
2. Compression
3. Combustion
4. Exhaust

These happen in sequence inside a cylinder-shaped chamber, as shown below.

(Gifs and images in the thread are all from Wikipedia.)

Read 8 tweets

Tivadar Danka

@TivadarDanka

16 Feb

At telesto.ai, we realized that we made a crucial mistake in organizing our workflow.

Up until now, we always started with the backend API when developing new features. Then, we added the UI.

You definitely shouldn't do that.

Let me explain why!

You always notice crucial flaws in the UI when seeing it for the first time.

It may be hard to use or straight-up lack functionality that you missed during planning.

However, changes require backend modifications as well. You have to do the work twice!

So, our workflow is now the following.

1. Sketch the UI in Figma.

2. Walk through the user flow several times.

3. Spot flaws and correct the UI.

4. Repeat 1-3 at least once.

5. Move on to design and implement corresponding backend functionality.

Read 4 tweets

Tivadar Danka

@TivadarDanka

16 Feb

Mean Square Error is one of the most ubiquitous error functions in machine learning.

Did you know that it arises naturally from Bayesian estimation? That seemingly rigid formula has a deep probabilistic meaning.

💡 Let's unravel it! 💡

@haltakov

If you are not familiar with the MSE, first check out this awesome explanation by @haltakov!

In the following, we are going to dig deep into the Bayesian roots of the formula!

(

https://twitter.com/haltakov/status/1358852194565558276

)

Suppose that you have a regression problem, like predicting apartment prices from square foot.

The data seems to follow a clear trend, although the variance is large. Fitting a function could work, but it seems wrong.

Read 13 tweets

Tivadar Danka

@TivadarDanka

15 Feb

Why is matrix multiplication defined the way it is?

When I first learned about it, the formula seemed too complicated and totally unintuitive! I wondered, why not just multiply elements at the same position together?

💡 Let me explain why! 💡

First, let's see how to even make sense of matrix multiplication!

The elements of the product are calculated by multiplying rows of 𝐴 with columns of 𝐵.

It is not trivial at all why this is the way. 🤔

To understand, let's talk about what matrices really are!

Matrices are actually just representations of 𝑙𝑖𝑛𝑒𝑎𝑟 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝑠: mappings between vector spaces that are interchangeable with linear operations.

Let's dig a bit deeper to see why are matrices and linear transformations are basically the same!

Read 12 tweets

Tivadar Danka

@TivadarDanka

11 Feb

Expected value is one of the most fundamental concepts in probability theory and machine learning.

Have you ever wondered what it really means and where does it come from?

The formula doesn't tell the entire story right away.

💡 Let's unravel what is behind the scenes! 💡

First, let's take a look at a simple example.

Suppose that we are playing a game. You toss a coin, and

• if it comes up heads, you win $1,
• but if it is tails, you lose $2.

Should you even play this game with me? 🤔

We are about to find out!

After 𝑛 rounds, your earnings can be calculated by the number of heads times 1 minus the number of tails times 2.

If we divide total earnings by 𝑛, we obtain the average earnings per round.

What happens if 𝑛 approaches infinity? 🤔

Read 9 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Tivadar Danka

Try unrolling a thread yourself!

More from @TivadarDanka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Tivadar Danka

Did Thread Reader help you today?

Like this author's thread?