This is the Huber loss - another complicated-looking formula...
Yet again, if you break it down and understand the individual, it becomes really easy.
Let me show you π
Background
The Huber loss is a loss function that is similar to the Mean Squared Error (MSE) but it is designed to be more robust to outliers.
MSE suffers from the problem that if there is a small number of severe outliers they can dominate the whole loss
How does it work? π
The key to understanding math formulas is not to try to understand everything at the same time.
Try looking at the terms inside the formula. Try to understand them from the inside to the outside...
Here, we can quickly see that one term is repeated several times...
π
Here y is the ground truth value we compare to and f(x) is the result provided by our model.
You can think for example about estimating house prices. Then y is the real prices and f(x) is the price our machine learning model predicted.
We can then simplify the formula a bit π
The next thing we see is that the formula has two parts.
The first part is a simple quadratic term Ξ±^2 (with a constant of 0.5).
The second part is a bit convoluted with a couple of constants, but it is an absolute (linear) term - |Ξ±|.
Let's simplify further...
The parameter Ξ΄ determines when we choose one part and when the other. Let's try to set to a fixed value for now, so that we can simplify things. Setting Ξ΄ = 1 gives us the simplest form.
OK, now if you ignore the constants (I'll come back to them later), it is quite simple
π
What the formula now tells us is that we take the square of Ξ± close to 0 and the absolute value otherwise.
Let's quickly implement the function in Python and plot it for Ξ΄ = 1.
Take a look at the plot - do you see the quadratic and the linear part?
π
Alright, let me annotate the image a little bit.
You can clearly see how the Huber loss behaves like a quadratic function close to 0 and like the absolute value further away.
OK, now we understand the core of the formula. Let's go back and undo the simplifications...
π
First, what's with these deltas?
We want our loss function to be continuous, so at the border between the two parts (when Ξ± = Ξ΄) they need to have the same value.
What the constants in the linear term do is just make sure that it equals the quadratic term when Ξ± = Ξ΄!
π
Finally, why is this constant 0.5 everywhere? Do we really need it?
The thing is that we typically use a loss function to compute its derivative and optimize our weights. And the derivative of 0.5*Ξ±^2 is... simply Ξ±.
We use the 0.5 just to make the derivative simpler π€·ββοΈ
π
And if you want to use the Huber loss, you probably don't need to implement it yourself - popular ML libraries already have it implemented:
The Huber loss takes the form of a quadratic function (like MSE) close to 0 and of a linear function (like MAE) away from zero. This makes it more robust to outliers while keeping it smooth around 0. You control the balance with the parameter Ξ΄.
Simple, right? π
I regularly write threads to explain complex concepts in machine learning and web3 in a simple manner.
How can I prove to you that I know a secret, without revealing any information about the secret itself?
This is called a zero-knowledge proof and it is a super interesting area of cryptography! But how does it work?
Thread π§΅
Let's start with an example
Peggie and Victor travel between cities A and B. There are two paths - a long path and a short path. The problem is that there is a gate on the short path for which you need a password.
Peggie knows the password, but Victor doesn't.
π
Victor wants to buy the password from Peggie so he can use the short path.
But what if Victor pays Peggie, but she lied and she didn't know the password? How can Peggie prove to Victor she knows the password, without actually revealing it?
Rescue Toadz looks like a regular NFT collection at first - you can mint a toad and you get an NFT in your wallet.
100% of the mint fee is directly sent to @Unchainfund - an organization that provides humanitarian aid to Ukraine and that has already raised $9M!
π
@ianbydesign@RescueToadz@Unchainfund@cryptoadzNFT The process is completely trustless and automatic! All the logic is coded in the smart contract which cannot be changed and which everybody can inspect.
You trust the code, not us! We have no way to steal the funds even if we wanted (we don't π).
Principal Component Analysis is a commonly used method for dimensionality reduction.
It's a good example of how fairly complex math can have an intuitive explanation and be easy to use in practice.
Let's start from the application of PCA π
Dimensionality Reduction
This is one of the common uses of PCA in machine learning.
Imagine you want to predict house prices. You get a large table of many houses and different features for them like size, number of rooms, location, age, etc.
Some features seem correlated π
Correlated features
For example, the size of the house is correlated with the number of rooms. Bigger houses tend to have more rooms.
Another example could be the age and the year the house was built - they give us pretty much the same information.
For regression problems you can use one of several loss functions:
βͺοΈ MSE
βͺοΈ MAE
βͺοΈ Huber loss
But which one is best? When should you prefer one instead of the other?
Thread π§΅
Let's first quickly recap what each of the loss functions does. After that, we can compare them and see the differences based on some examples.
π
Mean Square Error (MSE)
For every sample, MSE takes the difference between the ground truth and the model's prediction and computes its square. Then, the average over all samples is computed.