Tweet

@AlejandroPiad

More from @svpino

Santiago

@svpino

1 Mar

Let's talk about learning problems in machine learning:

▫️ Supervised Learning
▫️ Unsupervised Learning
▫️ Reinforcement Learning

And some hybrid approaches:

▫️ Semi-Supervised Learning
▫️ Self-Supervised Learning
▫️ Multi-Instance Learning

Grab your ☕️, and let's do this👇

Supervised Learning is probably the most common class of problems that we have all heard about.

We start with a dataset of examples and their corresponding labels (or answers.)

Then we teach a model the mapping between those examples and the corresponding label.

[2 / 19]

The goal of these problems is for a model to generalize from the examples that it sees to later answer similar questions.

There are two main types of Supervised Learning:

▫️ Classification → We predict a class label
▫️ Regression → We predict a numerical label

[3 / 19]

Read 19 tweets

Santiago

@svpino

28 Feb

Here are the best 10 machine learning threads I posted in February.

They go all the way from beginner-friendly content to a broader dive into specific machine learning concepts and techniques.

I'd love to hear which one is your favorite!

🧵👇

Having to pick only 10 threads is painful. I always struggle to decide what should stay out of the list.

This, however, is a great incentive when I'm writing the content: I have to compete against myself to make sure what I write ends up being part of the list!

[2 / 13]

https://twitter.com/svpino/status/1357302018428256258?s=20

[Thread 1]

An explanation about three of the most important metrics we use: accuracy, precision, and recall.

More specifically, this thread shows what happens when we focus on the wrong metric using an imbalanced classification problem.

[3 / 13]

https://twitter.com/svpino/status/1357302018428256258?s=20

Read 13 tweets

Santiago

@svpino

27 Feb

For the first time yesterday, I set up a project using a Development Container in Visual Studio Code and it immediately hit me:

✨ This is the way going forward! 🤯

If you haven't used this yet, here are some thoughts.

👇

@code

The basic idea: you can run your entire development environment inside a container.

Every time you open your project, @code prepares and runs your container.

[2 / 7]

There are several advantages to this:

First of all, your entire team will run exactly the same environment, regardless of their preferred operating system, folder structure, existing libraries, etc.

Everyone will have a mirrored experience.

[3 / 7]

Read 8 tweets

Santiago

@svpino

27 Feb

You want to build a face recognition system for your office, but getting many pictures from your coworkers is not a choice.

Also, having to retrain the model for every new employee seems like a burden.

How do we solve this?

Grab your ☕️ and let's do the thing!👇

To solve a standard classification problem, you collect many images representing the different classes you want to classify.

You label the images and train a classification model.

This is all good, but sometimes getting a lot of images is not an option.

[2 / 13]

A face recognition system is one example: getting many images for every person we want to support is impractical.

Another example is a signature verification system: we want a model capable of verifying a signature even when we didn't train it.

[3 / 13]

Read 16 tweets

Santiago

@svpino

26 Feb

Imagine you have a ton of data, but most of it isn't labeled. Even worse: labeling is very expensive. 😑

How can we get past this problem?

Let's talk about a different—and pretty cool—way to train a machine learning model.

☕️👇

Let's say we want to classify videos in terms of maturity level. We have millions of them, but only a few have labels.

Labeling a video takes a long time (you have to watch it in full!) We also don't know how many videos we need to build a good model.

[2 / 9]

In a traditional supervised approach, we don't have a choice: we need to spend the time and come up with a large dataset of labeled videos to train our model.

But this isn't always an option.

In some cases, this may be the end of the project. 😟

[3 / 9]

Read 10 tweets

Santiago

@svpino

25 Feb

Today, let's talk about two key data transformations we constantly use in machine learning:

▫️ Label encoding
▫️ One-hot-encoding

But let's not just talk about them, but try to build some intuition about why they are important.

Grab a coffee, and let's start! ☕️🧵👇

Imagine we have a dataset with two features:

▫️ "temperature" — a numeric value.
▫️ "weather" — a string value.

You should feel uncomfortable with this dataset right off the bat: machine learning algorithms usually don't like to work with non-numerical data.

[2 / 15]

To set the record straight, some algorithms don't mind non-numerical data.

For example, certain Decision Tree implementations will be fine with the "weather" feature from our example.

But a lot of them can only work with numbers.

[3 / 15]

Read 17 tweets

Share this page!

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Like this author's thread?