❓ Today let's look at two fundamental modelling paradigms that are used throughout the whole ML landscape.
Let's dive into Generative vs Discriminative models...
👇🧵 1/20
Say we want to learn to recognize 🐶dogs and 😺cats.
Let's not worry about input format right now and instead think in terms of abstract features, like, does it have pointy ears?
There are at least two ways in which we can do it.
👇 2/20
1️⃣ We can try to learn what is a dog and what is a cat, *independently*.
That is, which are the fundamental characteristics that best *define* each one of those classes.
👇 3/20
🐕 For example, we can learn that dogs have (generally) four legs, large noses, cute eyes, round ears, fur, and long tongues.
🐈 Similarly, we can learn that cats have (generally) four legs, small noses, sneaky eyes, pointy ears, fur, and short tongues.
👇 4/20
To classify a new animal 😼 we can then look at its features, and say:
❓ If this is a dog, what are the odds of seeing this kind of fur, this kind of legs, this kind of ears, ...
👇 5/20
Likewise, we can ask:
❓ If this is a cat, what are the odds of seeing this kind of fur, this kind of legs, this kind of ears, ...
Then we compare how surprised we are between seeing a 🐕 and 🐈 with these specific features.
👇 6/20
🔶 This is a Generative Model.
These models learn what are the fundamental features of a given class. Formally, these models estimate what is the probability P(f1,f2,...|C) of observing these specific features f1, f2, ..., in any given class C.
👇 7/20
The reason they are called *Generative* is that they try to learn explicitly how an example of a given class is *made*.
💡 You can often use these models to generate random examples of each class, by sampling from P(fi...|C).
👇 8/20
2️⃣ Alternatively, we can try to learn directly what makes a dog *different* from a cat.
That is, which are the fundamental characteristics that best *discriminate* between those classes.
👇 9/20
We can learn that the larger the nose or the tongue the more likely to be a 🐕, or that the pointier the ears and the sneakier the eyes, the more likely it is to be a 🐈.
And we do not care, for example, about the number of legs or the fur.
👇 10/20
To classify, we look at the features and say:
❓ Given these ears, and this nose, and these eyes, how likely is this to be a dog or a cat?
We compare the two results and answer the one we are most confident about.
👇 11/20
🔶 This is a Discriminative Model.
These models learn which are the features that separate different classes. Formally, they are estimating explicitly what is the probability P(C|f1,f2,...) of seeing class C given that we observe the features f1, f2, ...
👇 12/20
The reason they're called *Discriminative* is that they try to learn what makes a class *different* from all others.
💡 You can often use these models to compute feature importance by looking at which features best separate different classes.
👇 13/20
🔹A classic example of a generative model is Naive Bayes.
🔹A classic example of a discriminative model is Logistic Regression (and most neural networks).
👇 14/20
❓ Does this difference matter?
If you mostly care about performance, then no, there is no intrinsically best modelling paradigm, and only experimentation can tell you what to use.
However, depending on how you want to use the model, it can matter.
👇 15/20
Discriminative models are better at answering why (they think) this is the correct answer than generative models.
They will focus on the important features for the task, and disregard anything that doesn't help them to score better at answering.
👇 16/20
Discriminative models learn not what we want, but what's useful, which can often be something completely off-track, like spurious correlations or harmful biases in the training set.
👇 17/20
Generative models often encode stronger inductive biases because they represent a hypothesis (ours) about how the data is created.
This can make them more robust and controllable but if that hypothesis is too far from reality they may not learn anything useful.
👇 18/20
⭐ As usual, there is no silver bullet. You need to ask the right questions and be mindful of your assumptions.
❤️ If you liked this thread, please consider retweeting, following, and liking it, if you think I've earned it. And make sure to read the whole #FoundationsOfML series. It starts here:
The year is 2035. You're sitting comfortably in your L5 self-driven car, zooming across the highway.
Suddenly, the truck in front drops a big boulder. In a split second the car AI has to make a choice: brake or dodge... 🧵👇
⚠️ Even with a full brake you're not guarantee to survive the crash.
🚙 To your left there's a van driven by a human.
🚴♂️ To your right there's an unprotected biker.
There seems to be a recent surge in the "HTML is/isn't a programming language" discussion.
While there are a lot of honest misconceptions and also outright bullshit, I still think if we allow for some nuance there is a meaningful discussion to have about it.
My two cents 👇
First, to be bluntly clear, if a person is using this argument to make a judgment of character, to imply that someone is lesser because of their knowledge (or lack of) about HTML or other skills of any nature, then that person is an asshole.
With that out the way...
Why is this discussion meaningful at all?
If you are newcomer to the dev world and you have some misconceptions about it, you can find yourself getting into compromises you're not yet ready for, or letting go options you could take.
One of the very interesting questions that really got me thinking yesterday (they all did to an important degree) was from @Jeande_d regarding how to balance between learning foundational/transferable skills vs focusing on specific tools.
@Jeande_d My reasoning was that one should try hard not to learn too much of a tool, because any tool will eventually disappear. But tools are crucial to be productive, so one should still learn enough to really take advantage of the unique features of that tool.
@Jeande_d One way I think you can try to hit that sweet spot is practice some sort of dropout regularization on your common tool set.
In every new project, substitute one of your usual tools for some convenient alternative. It will make you a bit less productive, to be sure...
❓ Today, I want to start discussing the different types of Machine Learning flavors we can find.
This is a very high-level overview. In later threads, we'll dive deeper into each paradigm... 👇🧵
Last time we talked about how Machine Learning works.
Basically, it's about having some source of experience E for solving a given task T, that allows us to find a program P which is (hopefully) optimal w.r.t. some metric M.
According to the nature of that experience, we can define different formulations, or flavors, of the learning process.
A useful distinction is whether we have an explicit goal or desired output, which gives rise to the definitions of 1️⃣ Supervised and 2️⃣ Unsupervised Learning 👇
A big problem with social and political sciences is that they *look* so intuitive and approachable that literally everyone has an opinion.
If I say "this is how quantum entanglement works" almost no one will dare to reply.
But if I say "this is how content moderation works"...
And the thing is, there is huge amount of actual, solid science on almost any socially relevant topic, and most of us are as uninformed in that as we are on any dark corner of particle physics.
We just believe we can have an opinion, because the topic seems less objective.
So we are paying a huge disrespect to social scientists, who have to deal every day with the false notion that what they have been researching for years is something that anyone, thinking for maybe five minutes, can weigh in. This is of course nonsense.