Tweet

Santiago

Follow @svpino

31 Aug, 14 tweets, 3 min read

What can you do when your machine learning model stops improving?

There's always a point when you hit the ceiling and the performance of the model stalls.

Thread: A couple of tricks to improve your model.

Here is something that's keeping you from making progress:

You are using all of your data.

It turns out that more data is not always a good thing.

What would it look like to only focus on some of the data? Would that be helpful?

Here is the plan:

1. Find whether there's a portion of the data that's holding you back. Get rid of it.

2. Find whether a portion of the data is better suited for a different model.

Let's break these two apart to understand what to do.

The data might be noisy, and noise messes up your predictions.

First-round is about identifying these bad samples and getting rid of them.

Remember: f(🗑) → 🗑 (garbage in → garbage out)

The cleaner your training data is, the better your model performance will be.

There are many different ways to identify noise on your data.

Here is one using k-fold cross-validation:

1. Train and evaluate your model.
2. Pick the worst-performing fold.
3. Focus on the worst-performing samples.
4. Identify patterns.
5. Get rid of those samples.

Depending on your data and the model you are building, there are many other ways to identify bad samples.

Here is the important takeaway so far:

• More data is not always a good thing.

The cleaner your data, the better performance you should expect.

The second strategy is less common.

What would it look like to have different models working on the dataset? I know you have heard about "ensembles" before, but this one is a little bit different:

Different models working on different sections of the data.

Here is what you can do:

Slice your data in different cohesive sections, and try your model on each one of those.

For example, slice the dataset by any categorical column, and train a model on each slice separately.

Does your model perform better on one particular slice?

Here is a common scenario:

• You have a model with RMSE of 0.9.
• You slice the data into 3 sets.
• Train a model on each set.
• Performance on each set is now 0.99, 0.9, 0.55.

We can do something with this!

First, there's a set that outperforms our single model's performance with 0.99 RMSE. We definitely want to keep that.

But there's a set that underperforms it by a lot!

Can you build a different model that works better on that slice of the data?

Hopefully, the strategy is clear by now:

1. Start with a baseline model.
2. Slice out the data.
3. Train a model on each subset.
4. Identify underperformers.
5. Train a different model on them.
6. Combine results in an ensemble.

One thing to remember: there's no free lunch.

Sometimes, one model with 0.90 RMSE is much better than 3 models with 0.89 RMSE.

Unless you are only optimizing for the model's performance, complexity always comes at a cost.

Let's recap the four main ideas of this thread:

1. More data is not always better.

2. Finding and removing noise pays off.

3. Slicing out the data and building different models to tackle each slice is a way to squeeze better performance.

4. Complexity comes at a cost.

@svpino

I post threads like this every Tuesday and Friday.

Follow me @svpino for practical tips and stories about my experience with machine learning.

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svpino

Santiago

@svpino

15 Sep

Imagine I tell you this:

"The probability of a particular event happening is zero."

Contrary to what you may think, this doesn't mean that this event is impossible. In other words, events with 0 probability could still happen!

This seems contradictory. What's going on here?

Yesterday, I asked the question in the attached image.

Hundreds of people replied. Many of the answers followed the same logic:

"The probability can't be zero because that would mean that the event can't happen."

This, however, is not true.

Let's start with something that we know:

Impossible outcomes always have a probability of 0.

This means that the probability of an event that can't happen is always zero.

Makes sense. But the opposite is not necessarily true!

Read 12 tweets

Santiago

@svpino

14 Sep

It was a different morning.

People woke up that day to an astonishing New York Times article: "New Navy Device Learns By Doing."

It was July of 1958, and for the first time, an electronic device showed the ability to learn.

It was called "Perceptron."

Frank Rosenblatt was born in New York and spent most of his life as a research psychologist.

Sleepless years of research culminated in his best-known work, which shocked the world and was billed as a revolution.

His machine, designed for image recognition, was able to learn!

Frank's ideas were the center of controversy among the AI community.

The New York Times reported about the machine:

"[the Navy] expects will be able to walk, talk, see, write, reproduce, and be conscious of its existence.

Bold claims at that time!

Read 7 tweets

Santiago

@svpino

9 Sep

Antoine was born in France back in 1607.

Despite not being a nobleman, he called himself "Chevalier De Méré," and spent his days as any other writer and philosopher at the time.

But the Chevalier liked gambling, and was obsessed with the probabilities surrounding the game.

One day he started losing money unexpectedly.

His choices were between:

1. Getting at least one six with four throws of a die, or

2. Getting at least one double six with 24 throws of a pair of dice?

He believed both had equal probabilities, but luck kept eluding him. 🤦

This is how Méré thought about this problem:

1. Chance of getting one six in one roll: 1/6

2. Average number in four rolls: 4(1/6) = 2/3

3. Chance of getting double six in one roll: 1/36

4. Average number in 24 rolls: 24(1/36) = 2/3

Then, why was he losing money?

Read 10 tweets

Santiago

@svpino

7 Sep

If you want to become a better gambler, you need to learn probabilities.

(Also useful for machine learning, but who cares about that.)

Let's talk about the basic principles of probabilities that you need to understand.

This is what we are going to cover:

The four fundamental rules of probabilities and a couple of basic concepts.

These will help you look at the world in a completely different way.

(And become a better gambler, if that's what you choose to do.)

Let's start with an example:

If you throw a die, you'll get six possible elementary outcomes.

We call the collection of possible outcomes "Sample Space."

The attached image shows the sample space of throwing a single die.

Read 34 tweets

Santiago

@svpino

3 Sep

Machine learning models can be extremely powerful.

But there's a catch: they are notoriously hard to optimize for any given problem. There are just too many variables that we could change.

Thread: On keeping your sanity when training a model.

I'm sure you've heard about "hyperparameters."

Think of this as "configuration settings."

Depending on the settings you choose, your model will perform differently.

Sometimes better. Sometimes worse.

Here are some of the settings that we could change when building a model:

• learning rate
• batch size
• epochs
• optimizer
• regularization

The list goes on and on.

Read 15 tweets

Santiago

@svpino

1 Sep

I finished my first Kaggle competition and scored in the top 4% of participants.

I learned a few valuable lessons. Here they are: ↓

@Kaggle

Most important lesson:

@Kaggle is a firehouse of new knowledge. In around 3 weeks, I learned more than in the last 3 months combined.

It's not only about the competition, but the people and the collaboration.

If you haven't tried yet, consider it.

Kaggle is all about squeezing as much performance out of your solution as you can.

Complexity and runtime are secondary.

This is very different than real-life applications, but it forces you to learn something different and valuable.

Read 13 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!