Conditional probability is the single most important concept in statistics.
Why? Because without accounting for prior information, predictive models are useless.
Here is what conditional probability is, and why it is essential:
Conditional probability allows us to update our models by incorporating new observations.
By definition, P(B | A) describes the probability of an event B, given that A has occurred.
Here is an example. Suppose that among 100 emails, 30 are spam.
Based only on this information, if we inspect a random email, our best guess is a 30% chance of it being spam.
This is not good enough.
We can build a better model by looking at more information.
What about looking for certain keywords, like "deal"?
It turns out that among the 100 emails, 40 contain this word.
Let's say that an email contains the word "deal".
How does our probabilistic model change?
We can leverage the prior information to get a more precise prediction than the random 30%.
By taking a more detailed look, we notice that 24 emails with the word "deal" are spam.
Thus, we can compute the conditional probability by focusing on the mails containing "deal".
Using a similar logic, we get that without the expression "deal", the probability of spam drops to 10%!
Quite a difference between our model with no prior information.
Conditional probability restricts the event space, thus providing a more refined picture.
This gives better models, leading to better decisions.
Join 20,000+ ML practitioners who get 2 actionable emails every week to help them understand the math behind ML, make smarter decisions, and avoid costly mistakes.
Differentiation reveals much more than the slope of the tangent plane.
We like to think about it that way, but from a different angle, differentiation is the same as an approximation with a linear function. This allows us to generalize the concept.
Let's see why:
By definition, the derivative of a function at the point 𝑎 is defined by the limit of the difference quotient, representing the rate of change.
In geometric terms, the differential quotient represents the slope of the line between two points of the function's graph.
Matrix factorizations are the pinnacle results of linear algebra.
From theory to applications, they are behind many theorems, algorithms, and methods. However, it is easy to get lost in the vast jungle of decompositions.
This is how to make sense of them.
We are going to study three matrix factorizations:
1. the LU decomposition, 2. the QR decomposition, 3. and the Singular Value Decomposition (SVD).
First, we'll take a look at LU.
1. The LU decomposition.
Let's start at the very beginning: linear equation systems.
Linear equations are surprisingly effective in modeling real-life phenomena: economic processes, biochemical systems, etc.