Mean Square Error is one of the most ubiquitous error functions in machine learning.

Did you know that it arises naturally from Bayesian estimation? That seemingly rigid formula has a deep probabilistic meaning.

💡 Let's unravel it! 💡
If you are not familiar with the MSE, first check out this awesome explanation by @haltakov!

In the following, we are going to dig deep into the Bayesian roots of the formula!

()
Suppose that you have a regression problem, like predicting apartment prices from square foot.

The data seems to follow a clear trend, although the variance is large. Fitting a function could work, but it seems wrong.
We need a model that can explain the variance of the data, not just its mean. (Unlike a linear regression would.)

💡 Let's model it with probability distributions instead of just a function! 💡

Suppose that both variables we observe are from the distributions 𝑋 and 𝑌.
What we are looking for is the conditional distribution of 𝑌, given 𝑋!

This would provide complete information about our data.

How can we find this? 🤔
Now it is finally the time to do some modelling!

Let's assume that for each 𝑥, the conditional distributions we are looking for are Gaussians, with the mean as some function of the observations!
How do we fit our model? 🤔

Think about this. How likely is it that our observation given a parametrized function 𝑓 and 𝑥 is 𝑦?

💡 We want to find the estimator 𝑓 that maximizes the chance! 💡

This quantity is described by the so-called likelihood function.
Let's see how can we maximize the likelihood function!

In the first step, we just simply write out the Gaussian density function by hand.

(I encourage you to follow along in the calculations! They might be scary, but you'll get it, I am sure.)
Now we do a neat trick: maximizing a function is the same as maximizing its logarithm. (Since the logarithm is monotone increasing.)

We do this because taking the logarithm turns the product into a nice sum!
Recall that we want to maximize the above formula in 𝑓.

This means that we can discard a bunch of terms! Practically anything that doesn't contain 𝑓.

In the end, we are left with the one below.
Is the formula on the right familiar?

That is essentially the Mean Square Error!

💡 Thus, fitting a Gaussian model by optimizing the maximum likelihood is the same as minimizing MSE! 💡

In machine learning, probability theory is present everywhere behind the scenes.
Update! I'll add a small note here to explain why I said that this estimation is Bayesian. I sort of skimmed that one, so here is a short one!

Here, we actually want to estimate the posterior distribution of the model parameters, given all the data.
According to the Bayes' theorem,

posterior distribution ∝ (likelihood function) x (prior distribution).

In this case, by optimizing the likelihood function, we also optimize the posterior. (The result is called Maximum Posterior, or MAP in short.)

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

16 Feb
At telesto.ai, we realized that we made a crucial mistake in organizing our workflow.

Up until now, we always started with the backend API when developing new features. Then, we added the UI.

You definitely shouldn't do that.

Let me explain why!
You always notice crucial flaws in the UI when seeing it for the first time.

It may be hard to use or straight-up lack functionality that you missed during planning.

However, changes require backend modifications as well. You have to do the work twice!
So, our workflow is now the following.

1. Sketch the UI in Figma.

2. Walk through the user flow several times.

3. Spot flaws and correct the UI.

4. Repeat 1-3 at least once.

5. Move on to design and implement corresponding backend functionality.
Read 4 tweets
15 Feb
Why is matrix multiplication defined the way it is?

When I first learned about it, the formula seemed too complicated and totally unintuitive! I wondered, why not just multiply elements at the same position together?

💡 Let me explain why! 💡
First, let's see how to even make sense of matrix multiplication!

The elements of the product are calculated by multiplying rows of 𝐴 with columns of 𝐵.

It is not trivial at all why this is the way. 🤔

To understand, let's talk about what matrices really are!
Matrices are actually just representations of 𝑙𝑖𝑛𝑒𝑎𝑟 𝑡𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛𝑠: mappings between vector spaces that are interchangeable with linear operations.

Let's dig a bit deeper to see why are matrices and linear transformations are basically the same!
Read 12 tweets
11 Feb
Expected value is one of the most fundamental concepts in probability theory and machine learning.

Have you ever wondered what it really means and where does it come from?

The formula doesn't tell the entire story right away.

💡 Let's unravel what is behind the scenes! 💡
First, let's take a look at a simple example.

Suppose that we are playing a game. You toss a coin, and

• if it comes up heads, you win $1,
• but if it is tails, you lose $2.

Should you even play this game with me? 🤔

We are about to find out!
After 𝑛 rounds, your earnings can be calculated by the number of heads times 1 minus the number of tails times 2.

If we divide total earnings by 𝑛, we obtain the average earnings per round.

What happens if 𝑛 approaches infinity? 🤔
Read 9 tweets
10 Feb
At @telestoAI, we have built the entire backend of our competition platform in FastAPI.

Why did we choose this instead of Flask or Django?

👇 This is a thread about why.
1️⃣ Defining schemas for endpoints is brilliantly simple with Pydantic.

You only have to create a Pydantic class and use type annotations in the path operation function.
2️⃣ Dependency injections. This is such a powerful and versatile feature!

Essentially, these are functions that are automatically called during path operations, handing the return value as an argument o the path operation.

One common usage is to get database connections.
Read 9 tweets
4 Feb
If you are building a product, chances are you severely underestimate the importance of idea validation. (Especially if you are a developer.)

Key business assumptions can flop because you fail to look at different angles.

What are some basic questions to ask?

🧵 A thread. 🧵
𝐀𝐦 𝐈 𝐬𝐨𝐥𝐯𝐢𝐧𝐠 𝐚𝐧 𝐞𝐱𝐢𝐬𝐭𝐢𝐧𝐠 𝐩𝐫𝐨𝐛𝐥𝐞𝐦?

Often, the problem is not important enough to justify the existence of a solution. This is the most basic trap to fall for: there is no market need for the product.
Take a look at the top 20 reasons why startups fail by @CBinsights. The number 1 is no market need, causing around 43% percent of failures.

cbinsights.com/research/start…
Read 11 tweets
2 Nov 20
So, you want to wrap your machine learning model into an API. Flask used to be the best tool for that, but lately, FastAPI has become my favorite.

Here are my five main reasons why! 👇
1️⃣ Simple yet brilliant interface.

You define the request body model in Pydantic, write the endpoint function to process it, and finally register the route to the app.

That's it.
You can launch the app right away with uvicorn, ready to receive requests!
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!