In essence, a machine learning model works by doing the following two things.
1. Find an alternative representation of the data. 2. Make decisions based on this representation.
Linear algebra plays a role in the representations.
Regardless of the features, data points are represented by vectors.
Finding more descriptive representations is the same as finding functions f(x), mapping between vector spaces.
The simplest ones are the linear transformations given by matrices.
Why do we love linear transformations? There are two reasons.
• They are easy to work with and fast to compute.
• Combined with simple nonlinear functions, they can create expressive models.
What is their effect on the data? We'll see this next.
Linearity means that the order of addition, scalar multiplication, and function application can be changed.
So, a linear transformation is determined by the images of the basis vectors.
We can visualize this for linear transformations on the two-dimensional plane.
As you can see, the images of the basis vectors form a parallelogram. (Whose sides can fall onto a single line.)
From yet another perspective, this is the same as distorting the grid determined by the basis vectors.
How does this help to find good representations of the data?
Think about PCA, which finds features with no redundancy. This is done by a simple linear transformation. (If you are not familiar with how PCA works, here is a thread I posted earlier.)
So, linear transformations give rise to new features. How descriptive can these be?
For instance, in classification tasks, we want each high-level feature to represent the probability of belonging to a given class. Are linear transformations enough to express this?
Almost.
Any true underlying relationship between data and class label can be approximated by composing linear transformations with certain nonlinear functions (such as the Sigmoid or ReLU).
This is formally expressed by the Universal Approximation Theorem.
How to build a good understanding of math for machine learning?
I get this question a lot, so I decided to make a complete roadmap for you. In essence, three fields make this up: calculus, linear algebra, and probability theory.
Let's take a quick look at them!
🧵 👇
1. Linear algebra.
In machine learning, data is represented by vectors. Essentially, training a learning algorithm is finding more descriptive representations of data through a series of transformations.
Linear algebra is the study of vector spaces and their transformations.
Simply speaking, a neural network is just a function mapping the data to a high-level representation.
Linear transformations are the fundamental building blocks of these. Developing a good understanding of them will go a long way, as they are everywhere in machine learning.
You might be surprised, but I gained a lot from playing games. Board games, video games, all of them. Playing is a free-time activity, but it can teach a lot about life and work.
This thread is about the most important lessons I learned.
1. Taking responsibility for your mistakes.
Mistakes are the best way to learn, but you can do so by taking responsibility instead of looking for excuses. Stop blaming bad luck, lag, teammates, or anything else.
Be your own critic and identify where you can improve.
2/8
2. Actively focus on improvement.
Contrary to popular belief, "just doing it" is not an effective way to learn. Identifying flaws in your game, setting progressive goals, and keeping yourself accountable relentlessly supercharges the process. Play (work) with purpose.
Because humans can't see in more than 3D, it is challenging to make sense of it for the first time. However, there is a simple yet beautiful pattern behind.
This is how the magic is done!
What is a cube in one dimension?
It is simply two vertices connected with a line of unit length.
To move beyond and construct a cube in two dimensions, also known as a square, we simply copy a one-dimensional cube and connect each original vertex with its copy.
I learned it the hard way. After reading hundreds of articles, I figured out the methods of learning and extracting information the simplest way.
Here is how.
🧵 👇🏽
Regardless of fields, most well-written papers have a similar structure:
What is the problem?
🠓
What are the previous works?
🠓
What did previous works miss?
🠓
What is the main result?
🠓
Why does it work?
🠓
How it compares to others?
🠓
What are its limitations?
However, research papers are not meant to be read linearly.
There are several levels of understanding:
knowing 1. how to use the result, 2. when to use it, 3. why and how does it work, 4. and how to improve it.
Depending on your goal, the reading paths might differ.
What you see below is a 2D representation of the MNIST dataset.
It was produced by t-SNE, a completely unsupervised algorithm. The labels were unknown to it, yet it almost perfectly separates the classes. The result is amazing.
This is how the magic is done!
🧵 👇🏽
Even though real-life datasets can have several thousand features, often the data itself lies on a lower-dimensional manifold.
Dimensionality reduction aims to find these manifolds to simplify data processing down the line.
So, we have data points 𝑥ᵢ in a high-dimensional space, looking for lower dimensional representations 𝑦ᵢ.
We want the 𝑦ᵢ-s to preserve as many properties of the original as possible.
For instance, if 𝑥ᵢ is close to 𝑥ⱼ, we want 𝑦ᵢ to be close to 𝑦ⱼ as well.