I make math and machine learning accessible to everyone. Mathematician with an INTJ personality. Chaotic good.
28 subscribers
Jun 30 • 16 tweets • 5 min read
In calculus, going from a single variable to millions of variables is hard.
Understanding the three main types of functions helps make sense of multivariable calculus.
Surprisingly, they share a deep connection. Let's see why!
In general, a function assigns elements of one set to another.
This is too abstract for most engineering applications. Let's zoom in a little!
Jun 30 • 19 tweets • 6 min read
Neural networks are stunningly powerful.
This is old news: deep learning is state-of-the-art in many fields, like computer vision and natural language processing. (But not everywhere.)
Why are neural networks so effective? I'll explain.
First, let's formulate the classical supervised learning task!
Suppose that we have a dataset D, where xₖ is a data point and yₖ is the ground truth.
Jun 28 • 19 tweets • 6 min read
One major reason why mathematics is considered difficult: proofs.
Reading and writing proofs are hard, but you cannot get away without them. The best way to learn is to do.
So, let's deconstruct the proof of the most famous mathematical result: the Pythagorean theorem.
Here it is in its full glory.
Theorem. (The Pythagorean theorem.) Let ABC be a right triangle, let a and b be the length of its two legs, and let c be the length of its hypotenuse.
Then a² + b² = c².
Jun 26 • 18 tweets • 5 min read
Problem-solving is at least 50% of every job in tech and science.
Mastering problem-solving will make your technical skill level shoot up like a hockey stick. Yet, we are rarely taught how to do so.
Here are my favorite techniques that'll loosen even the most complex knots: 0. Is the problem solved yet?
The simplest way to solve a problem is to look for the solution elsewhere. This is not cheating; this is pragmatism. (Except if it is a practice problem. Then, it is cheating.)
Jun 25 • 20 tweets • 7 min read
What you see below is one of the most beautiful formulas in mathematics.
A single equation, establishing a relation between 𝑒, π, the imaginary number, and 1. It is mind-blowing.
This is what's behind the sorcery:
First, let's go back to square one: differentiation.
The derivative of a function at a given point describes the slope of its tangent plane.
Jun 24 • 29 tweets • 8 min read
"Probability is the logic of science."
There is a deep truth behind this conventional wisdom: probability is the mathematical extension of logic, augmenting our reasoning toolkit with the concept of uncertainty.
In-depth exploration of probabilistic thinking incoming.
Our journey ahead has three stops:
1. an introduction to mathematical logic, 2. a touch of elementary set theory, 3. and finally, understanding probabilistic thinking.
First things first: mathematical logic.
Jun 23 • 15 tweets • 6 min read
Understanding graph theory will seriously enhance your engineering skills; you must absolutely be familiar with them.
Here's a graph theory quickstart, in collaboration with @alepiad.
Read on:
What do the internet, your brain, the entire list of people you’ve ever met, and the city you live in have in common?
These are all radically different concepts, but they share a common trait.
They are all networks that establish relationships between objects.
Jun 22 • 27 tweets • 8 min read
In machine learning, we take gradient descent for granted.
We rarely question why it works.
What's usually told is the mountain-climbing analogue: to find the valley, step towards the steepest descent.
But why does this work so well? Read on.
Our journey is leading through
• differentiation, as the rate of change,
• the basics of differential equations,
• and equilibrium states.
Buckle up! Deep dive into the beautiful world of dynamical systems incoming. (Full post link at the end.)
Jun 21 • 19 tweets • 6 min read
Matrix factorizations are the pinnacle results of linear algebra.
From theory to applications, they are behind many theorems, algorithms, and methods. However, it is easy to get lost in the vast jungle of decompositions.
This is how to make sense of them.
We are going to study three matrix factorizations:
1. the LU decomposition, 2. the QR decomposition, 3. and the Singular Value Decomposition (SVD).
First, we'll take a look at LU.
Jun 20 • 17 tweets • 5 min read
Matrix multiplication is not easy to understand.
Even looking at the definition used to make me sweat, let alone trying to comprehend the pattern. Yet, there is a stunningly simple explanation behind it.
Let's pull back the curtain!
First, the raw definition.
This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.
We are going to unwrap this.
Jun 19 • 18 tweets • 5 min read
The single most undervalued fact of linear algebra: matrices are graphs, and graphs are matrices.
Encoding matrices as graphs is a cheat code, making complex behavior simple to study.
Let me show you how!
If you looked at the example above, you probably figured out the rule.
Each row is a node, and each element represents a directed and weighted edge. Edges of zero elements are omitted.
The element in the 𝑖-th row and 𝑗-th column corresponds to an edge going from 𝑖 to 𝑗.
Jun 17 • 19 tweets • 6 min read
Behold one of the mightiest tools in mathematics: the camel principle.
I am dead serious. Deep down, this tiny rule is the cog in many methods. Ones that you use every day.
Here is what it is, how it works, and why it is essential.
First, the story.
The old Arab passes away, leaving half of his fortune to his eldest son, third to his middle son, and ninth to his smallest.
Upon opening the stable, they realize that the old man had 17 camels.
Jun 14 • 34 tweets • 9 min read
The single biggest argument about statistics: is probability frequentist or Bayesian?
It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.
Jun 13 • 16 tweets • 5 min read
The single most important "side-effect" of solving linear equation systems: the LU decomposition.
Why? Because in practice, it is the engine behind inverting matrices and computing their determinants.
Here is how it works.
Why is the LU decomposition useful? There are two main applications:
• computing determinants,
• and inverting matrices.
Check out how the LU decomposition simplifies the determinant. (As the determinant of a triangular matrix is the product of the diagonal.)
Jun 12 • 22 tweets • 7 min read
You have probably seen the famous bell curve hundreds of times before.
It is often referred to as some sort of “probability”. Contary to popular belief, this is NOT a probability, but a probability density.
What are densities and why do we need them?
First, let's talk about probability.
The gist is, probability is a function P(A) that takes an event (that is, a set), and returns a real number between 0 and 1.
The event is a subset of the so-called sample space, a set often denoted with the capital Greek omega (Ω).
Jun 11 • 18 tweets • 5 min read
The way you think about the exponential function is (probably) wrong.
Don't think so? I'll convince you. Did you realize that multiplying e by itself π times doesn't make sense?
Here is what's really behind the most important function of all time.
First things first: terminologies. The expression aᵇ is read "a raised to the power of b." (Or a to the b in short.)
Jun 10 • 15 tweets • 5 min read
Logistic regression is one of the simplest models in machine learning, and one of the most revealing.
It shows us how to move from geometric intuition to probabilistic reasoning. Mastering it sets the foundation for everything else.
Let’s dissect it step by step!
Let’s start with the most basic setup possible: one feature, two classes.
You’re predicting if a student passes or fails based on hours studied.
Your input x is a number, and your output y is either 0 or 1.
Let's build a predictive model!
Jun 8 • 12 tweets • 5 min read
Differentiation reveals much more than the slope of the tangent plane.
We like to think about it that way, but from a different angle, differentiation is the same as an approximation with a linear function. This allows us to greatly generalize the concept.
Let's see why!
By definition, the derivative of a function at the point 𝑎 is defined by the limit of the difference quotient, representing the rate of change.
Jun 6 • 6 tweets • 2 min read
Most people see neural networks as magic.
But at their core, they’re just graphs. And those are built from math so simple, you learned it in high school.
Here’s how computational graphs make deep learning possible, and why they’re the real MVP of machine learning.
Representing graphs as matrices unlocked new discoveries in both CS and math.
Similarly, viewing neural networks as computational graphs unlocked modern ML.
The magic is in the representation.
Jun 4 • 18 tweets • 6 min read
This will surprise you: sine and cosine are orthogonal to each other.
What does orthogonality even mean for functions? In this thread, we'll use the superpower of abstraction to go far beyond our intuition.
We'll also revolutionize science on the way.
Our journey ahead has three milestones. We'll
1. generalize the concept of a vector, 2. show what angles really are, 3. and see what functions have to do with all this.
Here we go!
Jun 3 • 15 tweets • 5 min read
In machine learning, we use the dot product every day.
However, its definition is far from revealing. For instance, what does it have to do with similarity?
There is a beautiful geometric explanation behind.
By definition, the dot product (or inner product) of two vectors is defined by the sum of coordinate products.