Santiago Profile picture
31 Aug, 14 tweets, 3 min read
What can you do when your machine learning model stops improving?

There's always a point when you hit the ceiling and the performance of the model stalls.

Thread: A couple of tricks to improve your model.
Here is something that's keeping you from making progress:

You are using all of your data.

It turns out that more data is not always a good thing.

What would it look like to only focus on some of the data? Would that be helpful?
Here is the plan:

1. Find whether there's a portion of the data that's holding you back. Get rid of it.

2. Find whether a portion of the data is better suited for a different model.

Let's break these two apart to understand what to do.
The data might be noisy, and noise messes up your predictions.

First-round is about identifying these bad samples and getting rid of them.

Remember: f(🗑) → 🗑 (garbage in → garbage out)

The cleaner your training data is, the better your model performance will be.
There are many different ways to identify noise on your data.

Here is one using k-fold cross-validation:

1. Train and evaluate your model.
2. Pick the worst-performing fold.
3. Focus on the worst-performing samples.
4. Identify patterns.
5. Get rid of those samples.
Depending on your data and the model you are building, there are many other ways to identify bad samples.

Here is the important takeaway so far:

• More data is not always a good thing.

The cleaner your data, the better performance you should expect.
The second strategy is less common.

What would it look like to have different models working on the dataset? I know you have heard about "ensembles" before, but this one is a little bit different:

Different models working on different sections of the data.
Here is what you can do:

Slice your data in different cohesive sections, and try your model on each one of those.

For example, slice the dataset by any categorical column, and train a model on each slice separately.

Does your model perform better on one particular slice?
Here is a common scenario:

• You have a model with RMSE of 0.9.
• You slice the data into 3 sets.
• Train a model on each set.
• Performance on each set is now 0.99, 0.9, 0.55.

We can do something with this!
First, there's a set that outperforms our single model's performance with 0.99 RMSE. We definitely want to keep that.

But there's a set that underperforms it by a lot!

Can you build a different model that works better on that slice of the data?
Hopefully, the strategy is clear by now:

1. Start with a baseline model.
2. Slice out the data.
3. Train a model on each subset.
4. Identify underperformers.
5. Train a different model on them.
6. Combine results in an ensemble.
One thing to remember: there's no free lunch.

Sometimes, one model with 0.90 RMSE is much better than 3 models with 0.89 RMSE.

Unless you are only optimizing for the model's performance, complexity always comes at a cost.
Let's recap the four main ideas of this thread:

1. More data is not always better.

2. Finding and removing noise pays off.

3. Slicing out the data and building different models to tackle each slice is a way to squeeze better performance.

4. Complexity comes at a cost.
I post threads like this every Tuesday and Friday.

Follow me @svpino for practical tips and stories about my experience with machine learning.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

15 Sep
Imagine I tell you this:

"The probability of a particular event happening is zero."

Contrary to what you may think, this doesn't mean that this event is impossible. In other words, events with 0 probability could still happen!

This seems contradictory. What's going on here?
Yesterday, I asked the question in the attached image.

Hundreds of people replied. Many of the answers followed the same logic:

"The probability can't be zero because that would mean that the event can't happen."

This, however, is not true.
Let's start with something that we know:

Impossible outcomes always have a probability of 0.

This means that the probability of an event that can't happen is always zero.

Makes sense. But the opposite is not necessarily true!
Read 12 tweets
14 Sep
It was a different morning.

People woke up that day to an astonishing New York Times article: "New Navy Device Learns By Doing."

It was July of 1958, and for the first time, an electronic device showed the ability to learn.

It was called "Perceptron."
Frank Rosenblatt was born in New York and spent most of his life as a research psychologist.

Sleepless years of research culminated in his best-known work, which shocked the world and was billed as a revolution.

His machine, designed for image recognition, was able to learn!
Frank's ideas were the center of controversy among the AI community.

The New York Times reported about the machine:

"[the Navy] expects will be able to walk, talk, see, write, reproduce, and be conscious of its existence.

Bold claims at that time!
Read 7 tweets
9 Sep
Antoine was born in France back in 1607.

Despite not being a nobleman, he called himself "Chevalier De Méré," and spent his days as any other writer and philosopher at the time.

But the Chevalier liked gambling, and was obsessed with the probabilities surrounding the game.
One day he started losing money unexpectedly.

His choices were between:

1. Getting at least one six with four throws of a die, or

2. Getting at least one double six with 24 throws of a pair of dice?

He believed both had equal probabilities, but luck kept eluding him. 🤦
This is how Méré thought about this problem:

1. Chance of getting one six in one roll: 1/6

2. Average number in four rolls: 4(1/6) = 2/3

3. Chance of getting double six in one roll: 1/36

4. Average number in 24 rolls: 24(1/36) = 2/3

Then, why was he losing money?
Read 10 tweets
7 Sep
If you want to become a better gambler, you need to learn probabilities.

(Also useful for machine learning, but who cares about that.)

Let's talk about the basic principles of probabilities that you need to understand.
This is what we are going to cover:

The four fundamental rules of probabilities and a couple of basic concepts.

These will help you look at the world in a completely different way.

(And become a better gambler, if that's what you choose to do.)
Let's start with an example:

If you throw a die, you'll get six possible elementary outcomes.

We call the collection of possible outcomes "Sample Space."

The attached image shows the sample space of throwing a single die.
Read 34 tweets
3 Sep
Machine learning models can be extremely powerful.

But there's a catch: they are notoriously hard to optimize for any given problem. There are just too many variables that we could change.

Thread: On keeping your sanity when training a model.
I'm sure you've heard about "hyperparameters."

Think of this as "configuration settings."

Depending on the settings you choose, your model will perform differently.

Sometimes better. Sometimes worse.
Here are some of the settings that we could change when building a model:

• learning rate
• batch size
• epochs
• optimizer
• regularization

The list goes on and on.
Read 15 tweets
1 Sep
I finished my first Kaggle competition and scored in the top 4% of participants.

I learned a few valuable lessons. Here they are: ↓
Most important lesson:

@Kaggle is a firehouse of new knowledge. In around 3 weeks, I learned more than in the last 3 months combined.

It's not only about the competition, but the people and the collaboration.

If you haven't tried yet, consider it.
Kaggle is all about squeezing as much performance out of your solution as you can.

Complexity and runtime are secondary.

This is very different than real-life applications, but it forces you to learn something different and valuable.
Read 13 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(