How to build a good understanding of math for machine learning?

I get this question a lot, so I decided to make a complete roadmap for you. In essence, three fields make this up: calculus, linear algebra, and probability theory.

Let's take a quick look at them!

🧵 👇
1. Linear algebra.

In machine learning, data is represented by vectors. Essentially, training a learning algorithm is finding more descriptive representations of data through a series of transformations.

Linear algebra is the study of vector spaces and their transformations.
Simply speaking, a neural network is just a function mapping the data to a high-level representation.

Linear transformations are the fundamental building blocks of these. Developing a good understanding of them will go a long way, as they are everywhere in machine learning.
My favorite learning resources:

• Linear algebra university lectures by Gilbert Strang, taught at MIT (youtube.com/playlist?list=…)

• Linear Algebra Done Right by Sheldon Axler (linear.axler.net)
2. Calculus.

While linear algebra shows how to describe predictive models, calculus has the tools to fit them to the data.

If you train a neural network, you are almost certainly using gradient descent, which is rooted in calculus and the study of differentiation.
Besides differentiation, its "inverse" is also a central part of calculus: integration.

Integrals are used to express essential quantities such as expected value, entropy, mean squared error, and many more. They provide the foundations for probability and statistics.
When doing machine learning, we are dealing with functions with millions of variables.

In higher dimensions, things work differently. This is where multivariable calculus comes in, where differentiation and integration are adapted to these spaces.
My favorite learning resources:

• Single Variable Calculus at MIT (youtube.com/playlist?list=…)

• Khan Academy on Multivariable Calculus (youtube.com/playlist?list=…)

• Multivariable Calculus at MIT (youtube.com/playlist?list=…)
3. Probability theory

How to draw conclusions from experiments and observations? How to describe and discover patterns in them?

These are answered by probability theory and statistics, the logic of scientific thinking.
My favorite learning resources:

• Pattern Recognition and Machine Learning by Christopher Bishop (springer.com/gp/book/978038…)

• The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani, and Jerome Friedman (web.stanford.edu/~hastie/ElemSt…)
These fields form the foundations of mathematics in machine learning.

This is just the starting point. The most exciting stuff comes after these milestones! Advanced statistics, optimization techniques, backpropagation, the internals of neural networks.
So, how much math do you need to work in machine learning?

You can get started with high school math and pick everything up as you go. Advanced math is NOT a prerequisite.

Here is a recent thread about this by @svpino that sums up my thoughts.

If you would like to dig deeper, I have two recommendations for you.

First, I have written a long and detailed post, where I talk about each topic and subtopic in detail. This is a guide for your studies.

Check it out!

tivadardanka.com/blog/roadmap-o…
Second, I am writing a complete book about this, where I explain every concept as clearly and intuitively as possible.

The early access program will launch in September, releasing the chapters as I write them.

This is where you can join: tivadar.gumroad.com/l/mathematics-…
My goal is to bring the theory and math behind machine learning closer to everyone while eliminating all gatekeeping.

If you would like to join me on this journey, consider giving me a follow and retweeting the first tweet of this thread!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Tivadar Danka

Tivadar Danka Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @TivadarDanka

12 Aug
How you play determines who you are.

You might be surprised, but I gained a lot from playing games. Board games, video games, all of them. Playing is a free-time activity, but it can teach a lot about life and work.

This thread is about the most important lessons I learned.
1. Taking responsibility for your mistakes.

Mistakes are the best way to learn, but you can do so by taking responsibility instead of looking for excuses. Stop blaming bad luck, lag, teammates, or anything else.

Be your own critic and identify where you can improve.

2/8
2. Actively focus on improvement.

Contrary to popular belief, "just doing it" is not an effective way to learn. Identifying flaws in your game, setting progressive goals, and keeping yourself accountable relentlessly supercharges the process. Play (work) with purpose.
Read 8 tweets
10 Aug
For every topic in computer science, there is an XKCD comic that summarizes it perfectly. My all-time favorite one is the following.

Jokes aside, linear algebra plays a crucial part in machine learning. Here is why!

(image credit: @xkcd, original: xkcd.com/1838/) Image
In essence, a machine learning model works by doing the following two things.

1. Find an alternative representation of the data.
2. Make decisions based on this representation.

Linear algebra plays a role in the representations.
Regardless of the features, data points are represented by vectors.

Finding more descriptive representations is the same as finding functions f(x), mapping between vector spaces.

The simplest ones are the linear transformations given by matrices. Image
Read 12 tweets
21 Jun
What you see below is a cube in four dimensions.

Because humans can't see in more than 3D, it is challenging to make sense of it for the first time. However, there is a simple yet beautiful pattern behind.

This is how the magic is done!
What is a cube in one dimension?

It is simply two vertices connected with a line of unit length.
To move beyond and construct a cube in two dimensions, also known as a square, we simply copy a one-dimensional cube and connect each original vertex with its copy.

(These new edges are colored blue.)
Read 8 tweets
31 May
Last week, I lost my mother due to COVID.

Ever since then, I can't really focus on work. I keep thinking about life, death, and the meaning of it all.

I never post personal tweets/threads like this, but I have to write about this.
I am a man of science.

I don't believe in god, heaven, or any kind of afterlife.

I believe that if you die, you cease to exist. You fall into the void.
Everyone is fighting against the third law of thermodynamics.

Entropy increases. Knowledge decays. Your work dissipates.
Read 9 tweets
18 May
Reading research papers is a skill in itself.

I learned it the hard way. After reading hundreds of articles, I figured out the methods of learning and extracting information the simplest way.

Here is how.

🧵 👇🏽
Regardless of fields, most well-written papers have a similar structure:

What is the problem?
🠓
What are the previous works?
🠓
What did previous works miss?
🠓
What is the main result?
🠓
Why does it work?
🠓
How it compares to others?
🠓
What are its limitations?
However, research papers are not meant to be read linearly.

There are several levels of understanding:

knowing
1. how to use the result,
2. when to use it,
3. why and how does it work,
4. and how to improve it.

Depending on your goal, the reading paths might differ.
Read 9 tweets
12 May
What you see below is a 2D representation of the MNIST dataset.

It was produced by t-SNE, a completely unsupervised algorithm. The labels were unknown to it, yet it almost perfectly separates the classes. The result is amazing.

This is how the magic is done!

🧵 👇🏽
Even though real-life datasets can have several thousand features, often the data itself lies on a lower-dimensional manifold.

Dimensionality reduction aims to find these manifolds to simplify data processing down the line.
So, we have data points 𝑥ᵢ in a high-dimensional space, looking for lower dimensional representations 𝑦ᵢ.

We want the 𝑦ᵢ-s to preserve as many properties of the original as possible.

For instance, if 𝑥ᵢ is close to 𝑥ⱼ, we want 𝑦ᵢ to be close to 𝑦ⱼ as well.
Read 15 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(