I make math accessible for everyone. Mathematician with an INTJ personality. Chaotic good.
26 subscribers
Feb 28 • 28 tweets • 8 min read
I am an evangelist for simple ideas.
No matter the field, you can (almost always) find a small set of mind-numbingly simple ideas making the entire thing work.
In machine learning, the maximum likelihood estimation is one of those.
I'll start with a simple example to illustrate a simple idea.
Pick up a coin and toss it a few times, recording each outcome. The question is, once more, simple: what's the probability of heads?
We can't just immediately assume p = 1/2, that is, a fair coin.
Feb 26 • 17 tweets • 5 min read
The Law of Large Numbers is one of the most frequently misunderstood concepts of probability and statistics.
Just because you lost ten blackjack games in a row, it doesn’t mean that you’ll be more likely to be lucky next time.
What is the law of large numbers, then?
The strength of probability theory lies in its ability to translate complex random phenomena into coin tosses, dice rolls, and other simple experiments.
So, let’s stick with coin tossing. What will the average number of heads be if we toss a coin, say, a thousand times?
Feb 24 • 16 tweets • 5 min read
The expected value is one of the most important concepts in probability and statistics.
For instance, all the popular loss functions in machine learning, like cross-entropy, are expected values. However, its definition is far from intuitive.
Here is what's behind the scenes.
It's better to start with an example.
So, let's play a simple game! The rules: I’ll toss a coin, and if it comes up heads, you win $1. However, if it is tails, you lose $2.
Should you even play this game with me? We’ll find out.
Feb 21 • 21 tweets • 7 min read
You have probably seen the famous bell curve hundreds of times before.
It is often referred to as some sort of “probability”. Contary to popular belief, this is NOT a probability, but a probability density.
What are densities and why do we need them?
First, let's talk about probability.
The gist is, probability is a function P(A) that takes an event (that is, a set), and returns a real number between 0 and 1.
The event is a subset of the so-called sample space, a set often denoted with the capital Greek omega (Ω).
Feb 19 • 33 tweets • 9 min read
The single biggest argument about statistics: is probability frequentist or Bayesian?
It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.
Feb 17 • 9 tweets • 3 min read
If it is raining, the sidewalk is wet.
If the sidewalk is wet, is it raining? Not necessarily. Yet, we are inclined to think so. This is a preposterously common logical fallacy called "affirming the consequent".
However, it is not totally wrong. Why? Enter the Bayes theorem.
Propositions of the form "if A, then B" are called implications.
They are written as "A → B", and they form the bulk of our scientific knowledge.
Say, "if X is a closed system, then the entropy of X cannot decrease" is the 2nd law of thermodynamics.
Feb 14 • 28 tweets • 7 min read
"Probability is the logic of science."
There is a deep truth behind this conventional wisdom: probability is the mathematical extension of logic, augmenting our reasoning toolkit with the concept of uncertainty.
In-depth exploration of probabilistic thinking incoming.
Our journey ahead has three stops:
1. an introduction to mathematical logic, 2. a touch of elementary set theory, 3. and finally, understanding probabilistic thinking.
First things first: mathematical logic.
Feb 12 • 10 tweets • 3 min read
Conditional probability is the single most important concept in statistics.
Why? Because without accounting for prior information, predictive models are useless.
Here is what conditional probability is, and why it is essential.
Conditional probability allows us to update our models by incorporating new observations.
By definition, P(B | A) describes the probability of an event B, given that A has occurred.
Feb 10 • 8 tweets • 3 min read
How to build a good understanding of math for machine learning?
I get this question a lot, so I decided to make a complete roadmap for you. In essence, three fields make this up: calculus, linear algebra, and probability theory.
Let's take a quick look at them! 1. Linear algebra.
In machine learning, data is represented by vectors. Essentially, training a learning algorithm is finding more descriptive representations of data through a series of transformations.
Linear algebra is the study of vector spaces and their transformations.
Dec 4, 2023 • 15 tweets • 6 min read
Understanding graph theory will seriously enhance your engineering skills; you must absolutely be familiar with them.
Here's a graph theory quickstart, in collaboration with Alejandro Piad Morffis.
Read on:
What do the internet, your brain, the entire list of people you’ve ever met, and the city you live in have in common?
These are all radically different concepts, but they share a common trait.
They are all networks that establish relationships between objects.
Sep 13, 2023 • 19 tweets • 6 min read
Neural networks are stunningly powerful.
This is old news: deep learning is state-of-the-art in many fields, like computer vision and natural language processing. (But not everywhere.)
Why are neural networks so effective? I'll explain.
First, let's formulate the classical supervised learning task!
Suppose that we have a dataset D, where xₖ is a data point and yₖ is the ground truth.
Sep 12, 2023 • 15 tweets • 5 min read
A question we never ask:
"How large that number in the Law of Large Numbers is?"
Sometimes, a thousand samples are large enough. Sometimes, even ten million samples fall short.
How do we know? I'll explain.
First things first: the law of large numbers (LLN).
Roughly speaking, it states that the averages of independent, identically distributed samples converge to the expected value, given that the number of samples grows to infinity.
We are going to dig deeper.
Aug 24, 2023 • 13 tweets • 5 min read
With the power of mathematical induction, I'll prove that everyone has the same eye color.
Don't believe me? Read on.
(And see if you can spot the sleight of hand.)
To formalize the problem, define the proposition Aₙ by
Aₙ = "in a set of n people, everyone has the same eye color".
If n equals the human population of planet Earth, we get the original statement. We'll prove that Aₙ is true via induction.
Aug 21, 2023 • 34 tweets • 10 min read
The single biggest argument about statistics: is probability frequentist or Bayesian? It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.
Aug 8, 2023 • 11 tweets • 4 min read
The Japanese multiplication method makes everybody feel "I wish they taught math like this in school."
It's not just a cute visual tool: it illuminates how and why long multiplication works.
Here is the full story.
First, the Japanese multiplication method.
The first operand (21 in our case) is represented by two groups of lines: two lines in the first (1st digit), and one in the second (2nd digit).
One group for each digit.
Jul 26, 2023 • 19 tweets • 6 min read
One major reason why mathematics is considered difficult: proofs.
Reading and writing proofs are hard, but you cannot get away without them. The best way to learn is to do.
So, let's deconstruct the proof of the most famous mathematical result: the Pythagorean theorem.
Here it is in its full glory.
Theorem. (The Pythagorean theorem.) Let ABC be a right triangle, let a and b be the length of its two legs, and let c be the length of its hypotenuse.
Then a² + b² = c².
Jul 21, 2023 • 22 tweets • 6 min read
Problem-solving is at least 50% of every job in tech and science.
Mastering problem-solving will make your technical skill level shoot up like a hockey stick. Yet, we are rarely taught how to do so.
Here are my favorite techniques that'll loosen even the most complex knots: 0. Is the problem solved yet?
The simplest way to solve a problem is to look for the solution elsewhere. This is not cheating; this is pragmatism. (Except if it is a practice problem. Then, it is cheating.)
Jul 4, 2023 • 7 tweets • 3 min read
Yesterday, I posted the following puzzle.
There is a 80 m long cable, strung out between two 50 m tall poles. The bottom of the hanging cable is 10 m above ground. How far are the two poles from each other?
Here is the solution:
This problem is a typical instance of missing the forest for the trees.
If you gave it any thought, you probably attempted to apply fundamental physics to model the hanging cable, then calculate the distance using a complex system of equations.
This is not needed at all.
Jul 3, 2023 • 12 tweets • 3 min read
I've been studying math for 50% of my life.
The single most common question I get: why should I study mathematics as a ____? So, I have collected my thoughts for you.
Here are the most important things that math taught me:
1. The language of thinking.
Contrary to popular belief, math is not (only) about numbers. It's about abstraction and reasoning. This requires clear and concise thinking.
Thus, you can pick up an advanced thinking toolkit even from basic math.
May 25, 2023 • 25 tweets • 8 min read
Summing numbers is more exciting than you think.
For instance, summing the same alternating sequence of 1-s and (-1)-s can either be zero or one, depending on how we group the terms. What's wrong?
I'll explain. Enter the beautiful world of infinite series.
Let’s go back to square one: the sum of infinitely many terms is called an infinite series. (Or series in short.)
Infinite series form the foundations of mathematics.
May 9, 2023 • 14 tweets • 5 min read
Matrices + the Gram-Schmidt process = magic.
This magic is called the QR decomposition, and it's behind the famous eigenvalue-finding QR algorithm.
Here is how it works.
In essence, the QR decomposition factors an arbitrary matrix into the product of an orthogonal and an upper triangular matrix.
(We’ll illustrate everything with the 3 x 3 case, but everything works as is in general as well.)