In the last 24 hours, more than 400 of you decided to follow me. Thank you, I am honored!
As you probably know, I love explaining complex machine learning concepts simply. I have collected some of my past threads for you to make sure you don't miss out on them.
Convolution is not the easiest operation to understand: it involves functions, sums, and two moving parts.
However, there is an illuminating explanation — with probability theory!
There is a whole new aspect of convolution that you (probably) haven't seen before.
🧵 👇🏽
In machine learning, convolutions are most often applied for images, but to make our job easier, we shall take a step back and go to one dimension.
There, convolution is defined as below.
Now, let's forget about these formulas for a while, and talk about a simple probability distribution: we toss two 6-sided dices and study the resulting values.
To formalize the problem, let 𝑋 and 𝑌 be two random variables, describing the outcome of the first and second toss.
One of my favorite convolutional network architectures is the U-Net.
It solves a hard problem in such an elegant way that it became one of the most performant and popular choices for semantic segmentation tasks.
How does it work?
🧵 👇🏽
Let's quickly recap what semantic segmentation is: a common computer vision task, where we want to classify which class each pixel belongs to.
Because we want to provide a prediction on a pixel level, this task is much harder than classification.
Since the absolutely classic paper Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell, fully end-to-end autoencoder architectures were most commonly used for this.
There is a common misconception that all probability distributions are like a Gaussian.
Often, the reasoning involves the Central Limit Theorem.
This is not exactly right: they resemble Gaussian only from a certain perspective.
🧵 👇🏽
Let's state the CLT first. If we have 𝑋₁, 𝑋₂, ..., 𝑋ₙ independent and identically distributed random variables, their scaled sum is a Gaussian distribution in the limit.
The surprising thing here is the limit is independent of the variables' distribution.
Note that the random variables undergo a significant transformation: averaging and scaling with the mean, the variance, and √𝑛.
(The scaling transformation is the "certain perspective" I mentioned in the first tweet.)