Why is matrix multiplication defined the way it is?
When I first learned about it, the formula seemed too complicated and counter-intuitive! I wondered, why not just multiply elements at the same position together?
Let me explain why!
↓ A thread. ↓
First, let's see how to make sense of matrix multiplication!
The elements of the product are calculated by multiplying rows of 𝐴 with columns of 𝐵.
It is not trivial at all why this is the way. 🤔
To understand, let's talk about what matrices really are!
Matrices are just representations of linear transformations: mappings between vector spaces that are interchangeable with addition and scalar multiplication.
Let's dig a bit deeper to see why are matrices and linear transformations are (almost) the same!
The first thing to note is that every vector space has a basis, which can be used to uniquely express every vector as their linear combination.
The simplest example is probably the standard basis in the 𝑛-dimensional real Euclidean space.
(Or, with less fancy words, in 𝐑ⁿ, where 𝐑 denotes the set of real numbers.)
Why is this good for us? 🤔
💡 Because a linear transformation is determined by its behavior on basis vectors! 💡
If we know the image of the basis vectors, we can calculate the image of every vector, as I show below.
Because the image of a basis vector is just another vector in our vector space, it can also be expressed as the basis vectors' linear combination.
💡 These coefficients are the elements of the transformation's matrix! 💡
(The image of 𝑗-th basis gives the 𝑗-th column.)
So, let's recap!
For any linear transformation, there is a matrix such that the transformation itself corresponds to the multiplication with that matrix.
What is the equivalent of matrix multiplication in the language of linear transformations?
Function composition!
(Keep in mind that a linear transformation is a function, just mapping vectors to vectors.)
💡 Multiplication of matrices is just the composition of the corresponding linear transforms! 💡
This is why matrix multiplication is defined the way it is.
Having a deep understanding of math will make you a better engineer. I want to help you with this, so I am writing a comprehensive book about the subject.
If you are interested in the details and beauties of math, check out the early access!
"How large that number in the Law of Large Numbers is?"
Sometimes, a thousand samples are large enough. Sometimes, even ten million samples fall short.
How do we know? I'll explain.
First things first: the law of large numbers (LLN).
Roughly speaking, it states that the averages of independent, identically distributed samples converge to the expected value, given that the number of samples grows to infinity.
We are going to dig deeper.
There are two kinds of LLN-s: weak and strong.
The weak law makes a probabilistic statement about the sample averages: it implies that the probability of "the sample average falling farther from the expected value than ε" goes to zero for any ε.
The single biggest argument about statistics: is probability frequentist or Bayesian? It's neither, and I'll explain why.
Buckle up. Deep-dive explanation incoming.
First, let's look at what is probability.
Probability quantitatively measures the likelihood of events, like rolling six with a dice. It's a number between zero and one. This is independent of interpretation; it’s a rule set in stone.
In the language of probability theory, the events are formalized by sets within an event space.
(The event space is also a set, usually denoted by Ω.)