How to build a good understanding of math for machine learning?
I get this question a lot, so I decided to make a complete roadmap for you. In essence, three fields make this up: calculus, linear algebra, and probability theory.
Let's take a quick look at them!
🧵 👇
1. Linear algebra.
In machine learning, data is represented by vectors. Essentially, training a learning algorithm is finding more descriptive representations of data through a series of transformations.
Linear algebra is the study of vector spaces and their transformations.
Simply speaking, a neural network is just a function mapping the data to a high-level representation.
Linear transformations are the fundamental building blocks of these. Developing a good understanding of them will go a long way, as they are everywhere in machine learning.
While linear algebra shows how to describe predictive models, calculus has the tools to fit them to the data.
If you train a neural network, you are almost certainly using gradient descent, which is rooted in calculus and the study of differentiation.
Besides differentiation, its "inverse" is also a central part of calculus: integration.
Integrals are used to express essential quantities such as expected value, entropy, mean squared error, and many more. They provide the foundations for probability and statistics.
When doing machine learning, we are dealing with functions with millions of variables.
In higher dimensions, things work differently. This is where multivariable calculus comes in, where differentiation and integration are adapted to these spaces.
• The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (web.stanford.edu/~hastie/ElemSt…)
These fields form the foundations of mathematics in machine learning.
This is just the starting point. The most exciting stuff comes after these milestones! Advanced statistics, optimization techniques, backpropagation, the internals of neural networks.
So, how much math do you need to work in machine learning?
You can get started with high school math and pick everything up as you go. Advanced math is NOT a prerequisite.
Here is a recent thread about this by @svpino that sums up my thoughts.
You might be surprised, but I gained a lot from playing games. Board games, video games, all of them. Playing is a free-time activity, but it can teach a lot about life and work.
This thread is about the most important lessons I learned.
1. Taking responsibility for your mistakes.
Mistakes are the best way to learn, but you can do so by taking responsibility instead of looking for excuses. Stop blaming bad luck, lag, teammates, or anything else.
Be your own critic and identify where you can improve.
2/8
2. Actively focus on improvement.
Contrary to popular belief, "just doing it" is not an effective way to learn. Identifying flaws in your game, setting progressive goals, and keeping yourself accountable relentlessly supercharges the process. Play (work) with purpose.
Because humans can't see in more than 3D, it is challenging to make sense of it for the first time. However, there is a simple yet beautiful pattern behind.
This is how the magic is done!
What is a cube in one dimension?
It is simply two vertices connected with a line of unit length.
To move beyond and construct a cube in two dimensions, also known as a square, we simply copy a one-dimensional cube and connect each original vertex with its copy.
I learned it the hard way. After reading hundreds of articles, I figured out the methods of learning and extracting information the simplest way.
Here is how.
🧵 👇🏽
Regardless of fields, most well-written papers have a similar structure:
What is the problem?
🠓
What are the previous works?
🠓
What did previous works miss?
🠓
What is the main result?
🠓
Why does it work?
🠓
How it compares to others?
🠓
What are its limitations?
However, research papers are not meant to be read linearly.
There are several levels of understanding:
knowing 1. how to use the result, 2. when to use it, 3. why and how does it work, 4. and how to improve it.
Depending on your goal, the reading paths might differ.
What you see below is a 2D representation of the MNIST dataset.
It was produced by t-SNE, a completely unsupervised algorithm. The labels were unknown to it, yet it almost perfectly separates the classes. The result is amazing.
This is how the magic is done!
🧵 👇🏽
Even though real-life datasets can have several thousand features, often the data itself lies on a lower-dimensional manifold.
Dimensionality reduction aims to find these manifolds to simplify data processing down the line.
So, we have data points 𝑥ᵢ in a high-dimensional space, looking for lower dimensional representations 𝑦ᵢ.
We want the 𝑦ᵢ-s to preserve as many properties of the original as possible.
For instance, if 𝑥ᵢ is close to 𝑥ⱼ, we want 𝑦ᵢ to be close to 𝑦ⱼ as well.