I make math and machine learning accessible to everyone. Mathematician with an INTJ personality. Chaotic good.
29 subscribers
Aug 23 • 16 tweets • 5 min read
In calculus, going from a single variable to millions of variables is hard.
Understanding the three main types of functions helps make sense of multivariable calculus.
Surprisingly, they share a deep connection. Let's see why:
In general, a function assigns elements of one set to another.
This is too abstract for most engineering applications. Let's zoom in a little!
Aug 22 • 16 tweets • 5 min read
The most important concept in probability and statistics: the expected value
For instance, all the popular loss functions in machine learning, like cross-entropy, are expected values. However, its definition is far from intuitive.
Here is what's behind the scenes:
It's better to start with an example.
So, let's play a simple game! The rules: I’ll toss a coin, and if it comes up heads, you win $1. However, if it is tails, you lose $2.
Should you even play this game with me? We’ll find out.
Aug 21 • 24 tweets • 8 min read
Adding numbers is more exciting than you think.
For instance, summing the same alternating sequence of 1s and (-1)s can either be zero or one, depending on how we group the terms. What's wrong?
I'll explain. Enter the beautiful world of infinite series:
Let’s go back to square one: the sum of infinitely many terms is called an infinite series. (Or series in short.)
Infinite series form the foundations of mathematics.
Aug 20 • 18 tweets • 5 min read
The main reason math is considered difficult: proofs.
Reading and writing proofs are hard, but you cannot get away without them. The best way to learn is to do.
So, let's deconstruct the proof of the most famous mathematical result: the Pythagorean theorem.
Here it is in its full glory.
Theorem. (The Pythagorean theorem.) Let ABC be a right triangle, let a and b be the lengths of its two legs, and let c be the length of its hypotenuse.
Then a² + b² = c².
Aug 19 • 12 tweets • 4 min read
One of my favorite formulas is the closed-form of the geometric series.
I am amazed by its ubiquity: whether we are solving basic problems or pushing the boundaries of science, the geometric series often makes an appearance.
Here is how to derive it from first principles:
Let’s start with the basics: like any other series, the geometric series is the limit of its partial sums.
Our task is to find that limit.
Aug 16 • 18 tweets • 5 min read
Problem-solving is at least 50% of every job in tech and science.
Mastering problem-solving will make your technical skill level shoot up like a hockey stick. Yet, we are rarely taught how to do so.
Here are my favorite techniques that'll loosen even the most complex knots: 0. Is the problem solved yet?
The simplest way to solve a problem is to look for the solution elsewhere.
This is not cheating; this is pragmatism. (Except if it is a practice problem. Then, it is cheating.)
Aug 16 • 10 tweets • 3 min read
The following multiplication method makes everybody wish they had been taught math like this in school.
It's not just a cute visual tool: it illuminates how and why long multiplication works.
Here is the full story:
First, the method.
The first operand (21 in our case) is represented by two groups of lines: two lines in the first (1st digit), and one in the second (2nd digit).
One group for each digit.
Aug 15 • 9 tweets • 3 min read
If it is raining, the sidewalk is wet.
If the sidewalk is wet, is it raining? Not necessarily. Yet, we are inclined to think so. This is a common logical fallacy called "affirming the consequent".
However, it is not entirely wrong. Why? Enter the Bayes theorem:
Propositions of the form "if A, then B" are called implications.
They are written as "A → B", and they form the bulk of our scientific knowledge.
Say, "if X is a closed system, then the entropy of X cannot decrease" is the 2nd law of thermodynamics.
Aug 15 • 24 tweets • 7 min read
There is a non-recursive formula for the Fibonacci numbers, expressing them in terms of the golden ratio and its powers.
Why should you be interested? Because it teaches an extremely valuable lesson about power series.
Read on to find out what:
The Fibonacci numbers form one of the most famous integer sequences, known for their intimate connection to the golden ratio, sunflower spirals, mating habits of rabbits, and several other things.
By definition, they are defined by a simple second-order recursion:
Aug 14 • 17 tweets • 5 min read
The Law of Large Numbers is one of the most frequently misunderstood concepts of probability and statistics.
Just because you lost ten blackjack games in a row, it doesn’t mean that you’ll be more likely to be lucky next time.
What is the law of large numbers, then? Read on:
The strength of probability theory lies in its ability to translate complex random phenomena into coin tosses, dice rolls, and other simple experiments.
So, let’s stick with coin tossing.
What will the average number of heads be if we toss a coin, say, a thousand times?
Aug 13 • 15 tweets • 5 min read
The single most important "side-effect" of solving linear equation systems: the LU decomposition.
Why? Because in practice, it is the engine behind inverting matrices and computing their determinants.
Here is how it works:
Why is the LU decomposition useful?
There are two main applications:
• Computing determinants
• Inverting matrices
Check out how the LU decomposition simplifies the determinant.
(As the determinant of a triangular matrix is the product of the diagonal.)
Aug 12 • 25 tweets • 8 min read
In machine learning, we take gradient descent for granted.
We rarely question why it works.
What's usually told is the mountain-climbing analogue: to find the valley, step towards the steepest descent.
But why does this work so well? Read on:
Our journey is leading through:
• Differentiation, as the rate of change
• The basics of differential equations
• And equilibrium states
Buckle up!
Deep dive into the beautiful world of dynamical systems incoming.
Aug 10 • 21 tweets • 7 min read
You have seen the famous bell curve hundreds of times before.
Contrary to popular belief, this is NOT a probability, but a probability density.
What are densities, and why do we need them? Read on:
First, let's talk about probability.
The gist is, probability is a function P(A) that takes an event (that is, a set), and returns a real number between 0 and 1.
The event is a subset of the so-called sample space, a set often denoted with the capital Greek omega (Ω).
Aug 10 • 12 tweets • 3 min read
Most people think math is just numbers.
But after 20 years with it, I see it more like a mirror.
Here are 10 surprising lessons math taught me about life and work: 1. Breaking the rules is often the best course of action.
We have set theory because Bertrand Russell broke the notion that “sets are just collections of things.”
Aug 9 • 12 tweets • 4 min read
Differentiation reveals much more than the slope of the tangent plane.
We like to think about it that way, but from a different angle, differentiation is the same as an approximation with a linear function. This allows us to generalize the concept.
Let's see why:
By definition, the derivative of a function at the point 𝑎 is defined by the limit of the difference quotient, representing the rate of change.
Aug 9 • 14 tweets • 5 min read
Graph theory will seriously enhance your engineering skills.
Here's why you must be familiar with graphs:
What do the internet, your brain, the entire list of people you’ve ever met, and the city you live in have in common?
These are all radically different concepts, but they share a common trait.
They are all networks that establish relationships between objects.
Aug 8 • 15 tweets • 4 min read
I have spent at least 50% of my life studying, practicing, and teaching mathematics.
The most common misconceptions I encounter:
• Mathematics is useless
• You must be good with numbers
• You must be talented to do math
These are all wrong. Here's what math is really about:
Let's start with a story.
There’s a reason why the best ideas come during showers or walks. They allow the mind to wander freely, unchained from the restraints of focus.
One particular example is graph theory, born from the regular daily walks of the legendary Leonhard Euler.
Aug 8 • 10 tweets • 3 min read
Conditional probability is the single most important concept in statistics.
Why? Because without accounting for prior information, predictive models are useless.
Here is what conditional probability is, and why it is essential:
Conditional probability allows us to update our models by incorporating new observations.
By definition, P(B | A) describes the probability of an event B, given that A has occurred.
Aug 7 • 18 tweets • 6 min read
Neural networks are stunningly powerful.
This is old news: deep learning is state-of-the-art in many fields, like computer vision and natural language processing. (But not everywhere.)
Why are neural networks so effective? I'll explain:
First, let's formulate the classical supervised learning task!
Suppose that we have a dataset D, where xₖ is a data point and yₖ is the ground truth.
Aug 7 • 18 tweets • 5 min read
Matrix factorizations are the pinnacle results of linear algebra.
From theory to applications, they are behind many theorems, algorithms, and methods. However, it is easy to get lost in the vast jungle of decompositions.
This is how to make sense of them.
We are going to study three matrix factorizations:
1. the LU decomposition, 2. the QR decomposition, 3. and the Singular Value Decomposition (SVD).
First, we'll take a look at LU.
Aug 6 • 15 tweets • 5 min read
Logistic regression is one of the simplest models in machine learning, and one of the most revealing.
It shows us how to move from geometric intuition to probabilistic reasoning. Mastering it sets the foundation for everything else.
Let’s dissect it step by step!
Let’s start with the most basic setup possible: one feature, two classes.
You’re predicting if a student passes or fails based on hours studied.
Your input x is a number, and your output y is either 0 or 1.