12 Jan, 23 tweets, 4 min read
I'm starting a Twitter series on #FoundationsOfML. Today, I want to answer this simple question.

β What is Machine Learning?

This is my preferred way of explaining it... ππ§΅
Machine Learning is a computational approach to problem-solving with four key ingredients:

1οΈβ£ A task to solve T
2οΈβ£ A performance metric M
3οΈβ£ A computer program P
4οΈβ£ A source of experience E
You have a Machine Learning solution when:

π The performance of program P at task T, as measured by M, improves with access to the experience E.

That's it.

Now let's unpack it with a simple example π
Let's say the task is to play chessβοΈ.

A good performance metric could be, intuitively, the number of games won against a random opponent.

π The "classic" approach to solve this problem is to write a computer program that encodes our knowledge of what a "good" chess player is.
π€ This could be in the form of a huge number of IF/ELSEs for a bunch of classic openings and endings, plus some heuristics to play the mid-game, possibly based on assigning points to each piece/position and capturing the pieces with the highest points.
And this works, but...

It is tremendously difficult to, first, build, and then, maintain that program as new strategies are discovered. And we'll never know if we're playing the optimal strategy.

Now here is a Machine Learning approach to this problem π
You have to ask yourself first: is there a source of experience from which one can reasonably learn to play chess?

π For instance, a huge database of world-class chess games?

With that experience at hand, how do we actually code a Machine Learning program?
Details vary, but the bottom line is always the same.

π Instead of directly coding the program P that plays chess, what we write is kind of a meta-program, or "trainer", call it Q, that will itself give birth to P, by using that source of experience.
To do that, we have to predefine some sort of "mold" or "template" out of which P will come out.

π As a simple example, let's assume there are some scores we can assign to each piece/position so we can compute the "value" of any given board.
So P will be a very simple program:

- Generate every possible board after the current one, applying all valid moves.
- For each board, compute its value using those (still unknown) scores.
- Return the move that leads to the highest valued board.
The question is, of course, how do actually find the optimal program P? That is, how do we discover that assignment of scores that leads to optimal gameplay?

β­ We will write another program Q to find them!
β How do we know we found the best P?

Here is where the metric M comes at hand. The best chess program P is the one whose score distribution makes it play such that it wins the most number of games.
β And how do we actually find those points?

The easiest way to do it is to simply enumerate all possible instances of P, by trying all combinations of scores for all possible piece/position configurations.

π© But this might take forever!
A better approach is to use a bit of clever math.

π€― If we design those scores the right way, we can come up with some sort of an equation system, where all those scores are variables, and we can very quickly find the values that give us the optimal P!
And here is where the experience comes to play.

π€ To write that equation system, which is huge, we can use each board in each gameplay as a different equation, that basically says "this board is a winning board, it should sum 100" or "... a losing board, it should sum 0".
βοΈ After this, there is a piece of mathematical magic that tells us how we should assign the scores, such that the vast majority of "winning boards" sum close to 100 and the "losing boards" sum close to 0.
And we just made a machine "learn" how to play chess! To summarize...

π© In a "classic" approach we would:

- Define the desired output, i.e., the best move.
- Think very hard about the process to compute that output.
- Write the program P that produces that output.
π€ In a Machine Learning approach, instead we:

- Assume there is a "template" that any possible program P follows, parameterized with some unknown values.
- Write a program Q that finds the best values according to some experience E.
- Run Q on E to find the best program P.
In conclusion, there is a BIG paradigm shift in the Machine Learning approach.

π Instead of directly writing a program P to solve task T, you actually code a "trainer" program Q that, when run on suitable experience E, finds the best program P (according to some metric M).
π₯ The reason this paradigm is so hot now is that there is an incredible amount of tasks for which we don't know how to write P directly, but it's fairly straightforward how to write Q, provided we have enough experience (read: data) to train on.
π In ML lingo, a "template" for P is called a "model" or a "hypothesis space", and the actual instance of P, after training, is called the "hypothesis".

Q is any one of a large number of Machine Learning algorithms: decision trees, neural networks, naive Bayes...
β Next time, we'll talk about the different flavors of "experience" we can have, and how they define what type of "learning" we can actually attempt to do.

β’ β’ β’

Missing some Tweet in this thread? You can try to force a refresh
γ

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

13 Jan
This is a Twitter series on #FoundationsOfML.

β Today, I want to start discussing the different types of Machine Learning flavors we can find.

This is a very high-level overview. In later threads, we'll dive deeper into each paradigm... ππ§΅
Last time we talked about how Machine Learning works.

Basically, it's about having some source of experience E for solving a given task T, that allows us to find a program P which is (hopefully) optimal w.r.t. some metric M.

According to the nature of that experience, we can define different formulations, or flavors, of the learning process.

A useful distinction is whether we have an explicit goal or desired output, which gives rise to the definitions of 1οΈβ£ Supervised and 2οΈβ£ Unsupervised Learning π
12 Jan
A big problem with social and political sciences is that they *look* so intuitive and approachable that literally everyone has an opinion.

If I say "this is how quantum entanglement works" almost no one will dare to reply.

But if I say "this is how content moderation works"...
And the thing is, there is huge amount of actual, solid science on almost any socially relevant topic, and most of us are as uninformed in that as we are on any dark corner of particle physics.

We just believe we can have an opinion, because the topic seems less objective.
So we are paying a huge disrespect to social scientists, who have to deal every day with the false notion that what they have been researching for years is something that anyone, thinking for maybe five minutes, can weigh in. This is of course nonsense.
29 Dec 20
I've been a vocal opponent of the "neural networks are brain simulations" analogy, not because it's *wrong* but because I believe it's harmful for beginners.

I want to propose an alternative analogy for approaching deep learning from a dev background.

π
Think about detecting a face in an image.

How would you even start to write a program for that?

You know it's gonna have something to do with finding a "nose" and two "eyes", but how can you go from an array of pixels to something that looks like an eye, in whatever position?

How does that changes the problem?

Instead of thinking in the problem domain (finding faces) you can now take a leap upwards in abstraction, and think in the meta-problem domain (finding face finders).
21 Sep 20
Hey, today is #MindblowingMonday π€―!

A day to share with you amazing things from every corner of Computer Science.

π¬ But let's begin with some eye candy.

Take a look at this mind-blowing 2-minute video and, if you like it, then read on, I'll tell you a couple of things about it...

Generative Adversarial Networks (GAN) have taken by surprise the machine learning world with their uncanny ability to generate hyper-realistic examples of human faces, cars, landscapes, and a lot of other stuff, as you just saw.

Want to know how they work? π
17 Sep 20
Hey, guess what, today is #TheoryThursday π§!

A silly excuse I just invented to share with you random bits of theory from some dark corner of Computer Science and make it as beginner-friendly as possible π
Today I want to talk about *Algorithmic Complexity *.

To get started, take a look at the following code. How long do you think it will take to run it?

Let's make that question more precise. How long do you think it will take to run it in the worst-case scenario?
We can see that the code will run slower if:

π the array is longer; or
π x happens to be further to back, or not present at all.

Can we turn these insights into an actual formula? We will have to get rid of ambiguous stuff like "old computers".
11 Sep 20

The thread exploded with hundreds of insights and parallel discussions! Thanks to all who participated.

I'm gonna try to summarize the most interesting takes (from my POV) and thread in my own thoughts. Brace! π
First, as some suggested, this does happen in a couple languages:

JavaScript:

Groovy:

And even Python has optional semicolons:
docs.python.org/3/reference/siβ¦
However, these are not so much "errors" fixed by the compiler, but actual features.

They are accounted for in the grammar, and only in specific constructions where it is mostly unambiguous to do so.

(π€Whether this a good idea or not is a matter for another discussion).