13 Jan, 18 tweets, 4 min read
This is a Twitter series on #FoundationsOfML.

❓ Today, I want to start discussing the different types of Machine Learning flavors we can find.

This is a very high-level overview. In later threads, we'll dive deeper into each paradigm... 👇🧵
Last time we talked about how Machine Learning works.

Basically, it's about having some source of experience E for solving a given task T, that allows us to find a program P which is (hopefully) optimal w.r.t. some metric M.

According to the nature of that experience, we can define different formulations, or flavors, of the learning process.

A useful distinction is whether we have an explicit goal or desired output, which gives rise to the definitions of 1️⃣ Supervised and 2️⃣ Unsupervised Learning 👇
1️⃣ Supervised Learning

In this formulation, the experience E is a collection of input/output pairs, and the task T is defined as a function that produces the right output for any given input.
👉 The underlying assumption is that there is some correlation (or, in general, a computable relation) between the structure of an input and its corresponding output and that it is possible to infer that function or mapping from a sufficiently large number of examples.
The output can have any structure, including a simple atomic value.

In this case, there are two special sub-problems:

🅰️ Classification, when the output is a category out of a finite set.
🅱️ Regression, when the output is a continuous value, bounded or not.
2️⃣ Unsupervised Learning

In this formulation, the experience E is just a collection of elements, and the task is defined as finding some hidden structure that explains those elements and/or how they relate to each other.
👉 The underlying assumption is that there is some regularity in the structure of those elements which helps to explain their characteristics with a restricted amount of information, hopefully significantly less than just enumerating all elements.
Two common sub-problems are associated with where do we want to find that structure, inter- or intra-elements:

🅰️ Clustering, when we care about the structure relating to different elements.
🅱️ Dimensionality reduction, when we care about the structure internal to each element.
One of the fundamental differences between supervised and unsupervised learning problems is this:

☝️ In supervised problems is easier to define an objective metric of success, but it is much harder to get data, which almost always implies a manual labeling effort.
Even though the distinction between supervised and unsupervised is kind of straightforward, it is still somewhat fuzzy, and there are other learning paradigms that don't fit neatly into these categories.

Here's a short intro to three of them 👇
3️⃣ Reinforcement Learning

In this formulation, the experience E is not an explicit collection of data. Instead, we define an environment (a simulation of sorts) where an agent (a program) can take actions and observe their effect.
📝 This paradigm is useful when we have to learn to perform a sequence of actions, and there is no obvious way to define the "correct" sequence beforehand, other than trial and error, such as training artificial players for videogames, robots, or self-driven cars.
4️⃣ Semi-supervised Learning

This is kind of a mixture between supervised and unsupervised learning, in which you have explicit output samples for just a few of the inputs, but you have a lot of additional inputs where you can try, at least, to learn some structure.
📝 Examples are almost any supervised learning problem when we hit the point where getting additional *labeled* data (with both inputs and outputs) is too expensive, but it is easy to get lots of *unlabelled* data (just with inputs).
5️⃣ Self-supervised Learning

This is another paradigm that's kind of in-between supervised and unsupervised learning. Here we want to predict an explicit output, but that output is at the same time part of other inputs. So in a sense, the output is also defined implicitly.
📝 A straightforward example is in language models, like BERT and GPT, where the objective is (hugely oversimplifying) to predict the n-th word in a sentence from the surrounding words, a problem for which we have lots of data (i.e., all the text on the Internet).
All of these paradigms deserve a thread of their own, perhaps even more, so stay tuned for that!

⌛ But before getting there, next time we'll talk a bit about the fundamental differences in the kinds of models (or program templates) we can try to train.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

12 Jan
A big problem with social and political sciences is that they *look* so intuitive and approachable that literally everyone has an opinion.

If I say "this is how quantum entanglement works" almost no one will dare to reply.

But if I say "this is how content moderation works"...
And the thing is, there is huge amount of actual, solid science on almost any socially relevant topic, and most of us are as uninformed in that as we are on any dark corner of particle physics.

We just believe we can have an opinion, because the topic seems less objective.
So we are paying a huge disrespect to social scientists, who have to deal every day with the false notion that what they have been researching for years is something that anyone, thinking for maybe five minutes, can weigh in. This is of course nonsense.
12 Jan
I'm starting a Twitter series on #FoundationsOfML. Today, I want to answer this simple question.

❓ What is Machine Learning?

This is my preferred way of explaining it... 👇🧵
Machine Learning is a computational approach to problem-solving with four key ingredients:

1️⃣ A task to solve T
2️⃣ A performance metric M
3️⃣ A computer program P
4️⃣ A source of experience E
You have a Machine Learning solution when:

🔑 The performance of program P at task T, as measured by M, improves with access to the experience E.

That's it.

Now let's unpack it with a simple example 👇
29 Dec 20
I've been a vocal opponent of the "neural networks are brain simulations" analogy, not because it's *wrong* but because I believe it's harmful for beginners.

I want to propose an alternative analogy for approaching deep learning from a dev background.

👇
Think about detecting a face in an image.

How would you even start to write a program for that?

You know it's gonna have something to do with finding a "nose" and two "eyes", but how can you go from an array of pixels to something that looks like an eye, in whatever position?

How does that changes the problem?

Instead of thinking in the problem domain (finding faces) you can now take a leap upwards in abstraction, and think in the meta-problem domain (finding face finders).
21 Sep 20
Hey, today is #MindblowingMonday 🤯!

A day to share with you amazing things from every corner of Computer Science.

🍬 But let's begin with some eye candy.

Take a look at this mind-blowing 2-minute video and, if you like it, then read on, I'll tell you a couple of things about it...

Generative Adversarial Networks (GAN) have taken by surprise the machine learning world with their uncanny ability to generate hyper-realistic examples of human faces, cars, landscapes, and a lot of other stuff, as you just saw.

Want to know how they work? 👇
17 Sep 20
Hey, guess what, today is #TheoryThursday 🧐!

A silly excuse I just invented to share with you random bits of theory from some dark corner of Computer Science and make it as beginner-friendly as possible 👇
Today I want to talk about *Algorithmic Complexity *.

To get started, take a look at the following code. How long do you think it will take to run it?

Let's make that question more precise. How long do you think it will take to run it in the worst-case scenario?
We can see that the code will run slower if:

👉 the array is longer; or
👉 x happens to be further to back, or not present at all.

Can we turn these insights into an actual formula? We will have to get rid of ambiguous stuff like "old computers".
11 Sep 20

The thread exploded with hundreds of insights and parallel discussions! Thanks to all who participated.

I'm gonna try to summarize the most interesting takes (from my POV) and thread in my own thoughts. Brace! 👇
First, as some suggested, this does happen in a couple languages:

JavaScript:

Groovy:

And even Python has optional semicolons:
docs.python.org/3/reference/si…
However, these are not so much "errors" fixed by the compiler, but actual features.

They are accounted for in the grammar, and only in specific constructions where it is mostly unambiguous to do so.

(🤔Whether this a good idea or not is a matter for another discussion).