Santiago Profile picture
28 Feb, 13 tweets, 4 min read
Here are the best 10 machine learning threads I posted in February.

They go all the way from beginner-friendly content to a broader dive into specific machine learning concepts and techniques.

I'd love to hear which one is your favorite!

🧵👇
Having to pick only 10 threads is painful. I always struggle to decide what should stay out of the list.

This, however, is a great incentive when I'm writing the content: I have to compete against myself to make sure what I write ends up being part of the list!

[2 / 13]
[Thread 1]

An explanation about three of the most important metrics we use: accuracy, precision, and recall.

More specifically, this thread shows what happens when we focus on the wrong metric using an imbalanced classification problem.

[3 / 13]

[Thread 2]

My attempt to introduce how Convolutional Neural Networks work. Well, in reality, this is specifically about convolutions.

I love the idea of finding ways to relate convolutions to software development to make it more intuitive.

[4 / 13]

[Thread 3]

After going through convolutions, this thread explains how neural networks generalize.

There are many pictures of bunnies involved, but I really liked the storyline to explain things as simply as possible.

[5 / 13]

[Thread 4]

Active Learning is an iterative approach to supervised learning that helps when we have a lot of data, but very few labels.

This thread is an overview of this technique and how it works.

[6 / 13]

[Thread 5]

If you've heard about one/few-shot learning before but aren't sure exactly what it is, this is a good introduction to it.

Specifically, this thread talks about Siamese Networks and how they work.

[7 / 13]

[Thread 6]

Explaining a solution line by line is always fun.

This thread goes to an excruciating amount of detail through a Convolutional Neural Network that solves the MNIST problem.

An explanation for every single line of code.

[8 / 13]

[Thread 7]

When using Gradient Descent, the batch size is one of the most consequential hyperparameters at our disposal.

This thread explains everything you need to know about the batch size.

[9 / 13]

[Thread 8]

Encoding features from a dataset is a very common transformation that data scientists have to do before running machine learning algorithms.

This thread covers Label and One-Hot encoding, how they work and how to use them.

[10 / 13]

[Thread 9]

Bayes' theorem is very useful, but I didn't want to make it boring so that this thread will introduce it through an excellent little (surprising) problem.

And after you read it, you'll want to use it at your next dinner party.

[11 / 13]

[Thread 10]

I'm sure you've heard that we split our datasets into different subsets right before running our machine learning algorithms.

This thread is an explanation of why we do that. What's the goal of each subset, and how to use them.

[12 / 13]

March is about to start, and my challenge is to end up the month with 10 threads that completely kick the ass of February's Top 10.

Stay tuned because this is just getting started!

Which of these threads was your favorite one?

🦕

[13 / 13]

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

1 Mar
Let's talk about learning problems in machine learning:

▫️ Supervised Learning
▫️ Unsupervised Learning
▫️ Reinforcement Learning

And some hybrid approaches:

▫️ Semi-Supervised Learning
▫️ Self-Supervised Learning
▫️ Multi-Instance Learning

Grab your ☕️, and let's do this👇
Supervised Learning is probably the most common class of problems that we have all heard about.

We start with a dataset of examples and their corresponding labels (or answers.)

Then we teach a model the mapping between those examples and the corresponding label.

[2 / 19]
The goal of these problems is for a model to generalize from the examples that it sees to later answer similar questions.

There are two main types of Supervised Learning:

▫️ Classification → We predict a class label
▫️ Regression → We predict a numerical label

[3 / 19]
Read 19 tweets
27 Feb
For the first time yesterday, I set up a project using a Development Container in Visual Studio Code and it immediately hit me:

✨ This is the way going forward! 🤯

If you haven't used this yet, here are some thoughts.

👇
The basic idea: you can run your entire development environment inside a container.

Every time you open your project, @code prepares and runs your container.

[2 / 7]
There are several advantages to this:

First of all, your entire team will run exactly the same environment, regardless of their preferred operating system, folder structure, existing libraries, etc.

Everyone will have a mirrored experience.

[3 / 7]
Read 8 tweets
27 Feb
Imagine your favorite creator in Twitter starts offering the following:

1. A weekly newsletter
2. Deep dives into your favorite topics
3. A look behind the scenes
4. Live discussion invitations
5. Unfiltered exclusive content

$4.99/mo

Would you subscribe?
@AlejandroPiad and @yudivian I know what you vote would be, but let’s watch these results and see what the broader community thinks.

In my experience 1,000 answers is usually enough to capture the overall sentiment of my audience.
Who hasn’t voted yet?
Read 4 tweets
27 Feb
You want to build a face recognition system for your office, but getting many pictures from your coworkers is not a choice.

Also, having to retrain the model for every new employee seems like a burden.

How do we solve this?

Grab your ☕️ and let's do the thing!👇
To solve a standard classification problem, you collect many images representing the different classes you want to classify.

You label the images and train a classification model.

This is all good, but sometimes getting a lot of images is not an option.

[2 / 13]
A face recognition system is one example: getting many images for every person we want to support is impractical.

Another example is a signature verification system: we want a model capable of verifying a signature even when we didn't train it.

[3 / 13]
Read 16 tweets
26 Feb
Imagine you have a ton of data, but most of it isn't labeled. Even worse: labeling is very expensive. 😑

How can we get past this problem?

Let's talk about a different—and pretty cool—way to train a machine learning model.

☕️👇
Let's say we want to classify videos in terms of maturity level. We have millions of them, but only a few have labels.

Labeling a video takes a long time (you have to watch it in full!) We also don't know how many videos we need to build a good model.

[2 / 9]
In a traditional supervised approach, we don't have a choice: we need to spend the time and come up with a large dataset of labeled videos to train our model.

But this isn't always an option.

In some cases, this may be the end of the project. 😟

[3 / 9]
Read 10 tweets
25 Feb
Today, let's talk about two key data transformations we constantly use in machine learning:

▫️ Label encoding
▫️ One-hot-encoding

But let's not just talk about them, but try to build some intuition about why they are important.

Grab a coffee, and let's start! ☕️🧵👇
Imagine we have a dataset with two features:

▫️ "temperature" — a numeric value.
▫️ "weather" — a string value.

You should feel uncomfortable with this dataset right off the bat: machine learning algorithms usually don't like to work with non-numerical data.

[2 / 15]
To set the record straight, some algorithms don't mind non-numerical data.

For example, certain Decision Tree implementations will be fine with the "weather" feature from our example.

But a lot of them can only work with numbers.

[3 / 15]
Read 17 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!