Data similarity has such a simple visual interpretation that it will light all the bulbs in your head.
The mathematical magic tells you that similarity is given by the inner product. Have you thought about why?
This is how elementary geometry explains it all.
↓ A thread. ↓
Let's start in the beginning!
In machine learning, data is represented by vectors. So, instead of observations and features, we talk about tuples of (real) numbers.
Vectors have two special functions defined on them: their norms and inner products. Norms simply describe their magnitude, while inner products describe
.
.
.
well, a 𝐥𝐨𝐭 of things.
Let's start with the fundamentals!
First of all, the norm can be expressed in terms of the inner product.
Moreover, the inner product is linear in both variables. (Check these by hand if you don't believe me.)
Bilinearity gives rise to a geometric interpretation of the inner product.
If we form an imaginary triangle from 𝑥, 𝑦, and 𝑥+𝑦, we can express the inner product in terms of the sides' length.
(Even in higher dimensions, we can form this triangle. It'll be just on a two-dimensional subspace.)
However, applying the law of cosines, we obtain yet another way of expressing the length of 𝑥+𝑦, this time in terms of the other sides and the angle enclosed by them.
Putting these together, we see that the inner product of 𝑥 and 𝑦 is the product of
• the norm of 𝑥,
• the norm of 𝑦,
• and the cosine of their enclosed angle!
If we scale down 𝑥 and 𝑦 to unit lengths, their inner product simply gives the cosine of the angle.
You might know this as cosine similarity.
For data points, the closer it is to 1, the more the features move together.
Inner products play an essential part in data science and machine learning.
Because of this, they are the main topic of the newest chapter of my book, The Mathematics of Machine Learning. Each week, I release a new chapter, just as I write them.
Problem-solving is at least 50% of every job in tech and science.
Mastering problem-solving will make your technical skill level shoot up like a hockey stick. Yet, we are rarely taught how to do so.
Here are my favorite techniques that'll loosen even the most complex knots:
0. Is the problem solved yet?
The simplest way to solve a problem is to look for the solution elsewhere. This is not cheating; this is pragmatism. (Except if it is a practice problem. Then, it is cheating.)
When your objective is to move fast, this should be the first thing you attempt.
This is the reason why Stack Overflow (and its likes) are the best friends of every programmer.
There is a deep truth behind this conventional wisdom: probability is the mathematical extension of logic, augmenting our reasoning toolkit with the concept of uncertainty.
In-depth exploration of probabilistic thinking incoming.
Our journey ahead has three stops:
1. an introduction to mathematical logic, 2. a touch of elementary set theory, 3. and finally, understanding probabilistic thinking.
First things first: mathematical logic.
In logic, we work with propositions.
A proposition is a statement that is either true or false, like
• "it's raining outside",
• "the sidewalk is wet".
These are often abbreviated as variables, such as A = "it's raining outside".
In machine learning, we take gradient descent for granted.
We rarely question why it works.
What's usually told is the mountain-climbing analogue: to find the valley, step towards the steepest descent.
But why does this work so well? Read on.
Our journey is leading through
• differentiation, as the rate of change,
• the basics of differential equations,
• and equilibrium states.
Buckle up! Deep dive into the beautiful world of dynamical systems incoming. (Full post link at the end.)
First, let's talk about derivatives and their mechanical interpretation!
Suppose that the position of an object at time t is given by the function x(t), and for simplicity, assume that it is moving along a straight line — as the distance-time plot illustrates below.
Matrix factorizations are the pinnacle results of linear algebra.
From theory to applications, they are behind many theorems, algorithms, and methods. However, it is easy to get lost in the vast jungle of decompositions.
This is how to make sense of them.
We are going to study three matrix factorizations:
1. the LU decomposition, 2. the QR decomposition, 3. and the Singular Value Decomposition (SVD).
First, we'll take a look at LU.
1. The LU decomposition.
Let's start at the very beginning: linear equation systems.
Linear equations are surprisingly effective in modeling real-life phenomena: economic processes, biochemical systems, etc.