One of my favorite convolutional network architectures is the U-Net.
It solves a hard problem in such an elegant way that it became one of the most performant and popular choices for semantic segmentation tasks.
How does it work?
🧵 👇🏽
Let's quickly recap what semantic segmentation is: a common computer vision task, where we want to classify which class each pixel belongs to.
Because we want to provide a prediction on a pixel level, this task is much harder than classification.
Since the absolutely classic paper Fully Convolutional Networks for Semantic Segmentation by Jonathan Long, Evan Shelhamer, and Trevor Darrell, fully end-to-end autoencoder architectures were most commonly used for this.
One of the huge advantages of the fully convolutional architecture is that it eliminates the need for hand-engineering post-processing.
Due to the end-to-end training, post-processing is learned!
However, this is not without new complications.
These networks first downsample the image, learning a feature representation. This feature representation is then upsampled to predict class labels per pixel.
There is a huge problem: information is lost during downsampling. Deeper architecture means more information loss.
In certain fields, this is a big issue.
For instance, in cell microscopy, cells can grow really close to each other, even as close as 1-2 pixels. Downsampling destroys these small margins.
This is demonstrated in the U-Net paper, as you can see below.
In their paper, Olaf Ronneberger, Philipp Fischer, and Thomas Brox introduce U-Net to solve the problem (arxiv.org/abs/1505.04597).
The solution is elegant and simple: save the downsampling layers' input, then feed them back during the corresponding upsampling step.
U-Net not only solved the information loss but knocked all other semantic segmentation architectures out of the park as well.
Even half a decade later, U-Net is often the go-to model for the task.
Personally, this is the first thing I try for a new dataset.
Its popularity is reflected by the 24842 citations to date, catapulting the paper into the machine learning hall of fame.
By the time this tweet is published, this number is probably going to increase.
Image sources.
1st image: Fully Convolutional Networks for Semantic Segmentation by Jonathan Long et al., arxiv.org/abs/1411.4038v2
The rest: U-Net: Convolutional Networks for Biomedical Image Segmentation by Olaf Ronneberger et al., arxiv.org/abs/1505.04597
• • •
Missing some Tweet in this thread? You can try to
force a refresh
There is a common misconception that all probability distributions are like a Gaussian.
Often, the reasoning involves the Central Limit Theorem.
This is not exactly right: they resemble Gaussian only from a certain perspective.
🧵 👇🏽
Let's state the CLT first. If we have 𝑋₁, 𝑋₂, ..., 𝑋ₙ independent and identically distributed random variables, their scaled sum is a Gaussian distribution in the limit.
The surprising thing here is the limit is independent of the variables' distribution.
Note that the random variables undergo a significant transformation: averaging and scaling with the mean, the variance, and √𝑛.
(The scaling transformation is the "certain perspective" I mentioned in the first tweet.)