I make math and machine learning accessible to everyone. Mathematician with an INTJ personality. Chaotic good.
28 subscribers
Jul 30 • 14 tweets • 4 min read
One of the coolest ideas in mathematics is the estimation of a shape's area by throwing random points at it.
Don't believe this works? Check out the animation below, where I show the method on the unit circle. (Whose area equals to π.)
Here is what's behind the magic:
Let's make this method precise!
The first step is to enclose our shape S in a square.
You can imagine this as a rectangular dartboard.
Jul 29 • 15 tweets • 5 min read
What is common between the Fourier series and the Cartesian coordinate system?
More than you think: they are (almost) the same.
Let me explain why!
Let's start with the basics: the inner product.
In the Euclidean plane, it can be calculated using the "magnitude x magnitude x cosine" formula, also known as the geometric definition.
Jul 27 • 13 tweets • 4 min read
One of my favorite formulas is the closed-form of the geometric series.
I am amazed by its ubiquity: whether we are solving basic problems or pushing the boundaries of science, the geometric series often makes an appearance.
Here is how to derive it from first principles:
Let’s start with the basics: like any other series, the geometric series is the limit of its partial sums.
Our task is to find that limit.
Jul 26 • 13 tweets • 4 min read
Matrices + the Gram-Schmidt process = magic.
This magic is called the QR decomposition, and it's behind the famous eigenvalue-finding QR algorithm.
Here is how it works:
In essence, the QR decomposition factors an arbitrary matrix into the product of an orthogonal and an upper triangular matrix.
(We’ll illustrate everything with the 3 x 3 case, but everything works as is in general as well.)
Jul 25 • 24 tweets • 8 min read
Summing numbers is more exciting than you think.
For instance, summing the same alternating sequence of 1s and (-1)s can either be zero or one, depending on how we group the terms. What's wrong?
I'll explain. Enter the beautiful world of infinite series.
Let’s go back to square one: the sum of infinitely many terms is called an infinite series. (Or series in short.)
Infinite series form the foundations of mathematics.
Jul 24 • 17 tweets • 5 min read
I have spent at least 50% of my life studying, practicing, and teaching mathematics.
The most common misconceptions I encounter:
• Mathematics is useless
• You must be good with numbers
• You must be talented to do math
These are all wrong. Here's what math is really about:
Let's start with a story.
There’s a reason why the best ideas come during showers or walks. They allow the mind to wander freely, unchained from the restraints of focus.
One particular example is graph theory, born from the regular daily walks of the legendary Leonhard Euler.
Jul 23 • 13 tweets • 4 min read
If you think about sets as "collections of objects", you are painfully wrong.
This naive approach to set theory quickly led to a discovery that upended mathematics and caused an enormous crisis. (At least one.)
Meet Russell's paradox.
In school, we have learned to define sets by specifying their elements.
This can be done in two main ways:
1) explicitly enumerating their elements, 2) or specifying a parent set and a property that filters elements from the parent set.
Jul 21 • 28 tweets • 8 min read
I am an evangelist for simple ideas.
No matter the field, you can (almost always) find a small set of mind-numbingly simple ideas making the entire thing work.
In machine learning, the maximum likelihood estimation is one of those.
I'll start with a simple example to illustrate a simple idea.
Pick up a coin and toss it a few times, recording each outcome. The question is, once more, simple: what's the probability of heads?
We can't just immediately assume p = 1/2, that is, a fair coin.
Jul 19 • 15 tweets • 5 min read
A question we never ask:
"How large is that number in the Law of Large Numbers?"
Sometimes, a thousand samples are large enough. Sometimes, even ten million samples fall short.
How do we know? I'll explain.
First things first: the law of large numbers (LLN).
Roughly speaking, it states that the averages of independent, identically distributed samples converge to the expected value, given that the number of samples grows to infinity.
We are going to dig deeper.
Jul 18 • 33 tweets • 9 min read
The single biggest argument about statistics: is probability frequentist or Bayesian?
It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.
Jul 16 • 21 tweets • 7 min read
You have probably seen the famous bell curve hundreds of times before.
It is often referred to as some sort of “probability”. Contrary to popular belief, this is NOT a probability, but a probability density.
What are densities and why do we need them?
First, let's talk about probability.
The gist is, probability is a function P(A) that takes an event (that is, a set), and returns a real number between 0 and 1.
The event is a subset of the so-called sample space, a set often denoted with the capital Greek omega (Ω).
Jul 15 • 9 tweets • 3 min read
If it is raining, the sidewalk is wet.
If the sidewalk is wet, is it raining? Not necessarily. Yet, we are inclined to think so. This is a preposterously common logical fallacy called "affirming the consequent".
However, it is not totally wrong. Why? Enter the Bayes theorem.
Propositions of the form "if A, then B" are called implications.
They are written as "A → B", and they form the bulk of our scientific knowledge.
Say, "if X is a closed system, then the entropy of X cannot decrease" is the 2nd law of thermodynamics.
Jul 14 • 28 tweets • 7 min read
"Probability is the logic of science."
There is a deep truth behind this conventional wisdom: probability is the mathematical extension of logic, augmenting our reasoning toolkit with the concept of uncertainty.
In-depth exploration of probabilistic thinking incoming.
Our journey ahead has three stops:
1. an introduction to mathematical logic, 2. a touch of elementary set theory, 3. and finally, understanding probabilistic thinking.
First things first: mathematical logic.
Jul 13 • 10 tweets • 3 min read
Conditional probability is the single most important concept in statistics.
Why? Because without accounting for prior information, predictive models are useless.
Here is what conditional probability is, and why it is essential.
Conditional probability allows us to update our models by incorporating new observations.
By definition, P(B | A) describes the probability of an event B, given that A has occurred.
Jul 11 • 12 tweets • 3 min read
Most people think math is just numbers.
But after 20 years with it, I see it more like a mirror.
Here are 10 surprising lessons math taught me about life, work, and thinking clearly: 1. Breaking the rules is often the best course of action.
We have set theory because Bertrand Russell broke the notion that “sets are just collections of things.”
Jul 8 • 18 tweets • 6 min read
This will surprise you: sine and cosine are orthogonal to each other.
What does orthogonality even mean for functions? In this thread, we'll use the superpower of abstraction to go far beyond our intuition.
We'll also revolutionize science on the way.
Our journey ahead has three milestones. We'll
1. generalize the concept of a vector, 2. show what angles really are, 3. and see what functions have to do with all this.
Here we go!
Jul 7 • 15 tweets • 5 min read
In machine learning, we use the dot product every day.
However, its definition is far from revealing. For instance, what does it have to do with similarity?
There is a beautiful geometric explanation behind.
By definition, the dot product (or inner product) of two vectors is defined by the sum of coordinate products.
Jul 5 • 6 tweets • 2 min read
If I had to learn Math for Machine Learning from scratch, this is the roadmap I would follow: 1. Linear Algebra
These are non-negotiables:
• Vectors
• Matrices
• Equations
• Factorizations
• Matrices and graphs
• Linear transformations
• Eigenvalues and eigenvectors
Now you've learned how to represent and transform data.
Jul 3 • 18 tweets • 6 min read
Behold one of the mightiest tools in mathematics: the camel principle.
I am dead serious. Deep down, this tiny rule is the cog in many methods. Ones that you use every day.
Here is what it is, how it works, and why it is essential.
First, the story.
The old Arab passes away, leaving half of his fortune to his eldest son, third to his middle son, and ninth to his smallest.
Upon opening the stable, they realize that the old man had 17 camels.
Jul 3 • 33 tweets • 9 min read
The single biggest argument about statistics: is probability frequentist or Bayesian?
It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one.
This is independent of interpretation; it’s a rule set in stone.
Jul 2 • 16 tweets • 5 min read
Matrix multiplication is not easy to understand.
Even looking at the definition used to make me sweat, let alone trying to comprehend the pattern. Yet, there is a stunningly simple explanation behind it.
Let's pull back the curtain!
First, the raw definition.
This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.