Tweet

Santiago

Follow @svpino

10 Jun, 22 tweets, 5 min read

Machine learning superpower: Be the one that makes better predictions.

I can teach you how to do this by putting 2+ models together.

Thread: beginner-friendly introduction to Ensembles.

• What are they?
• Why do they work?
• Real-life examples.
• Practical tips.

↓

A group of models working together is called an "ensemble."

Instead of using a single model, you could build 2 different models and have them vote to select the best answer.

You could also build 3, 4, or however many models you want.

This is powerful.

Here is the surprising part:

Ensembles usually perform better than all of their individual models.

Let's look at one example.

Ask a Yes/No question to 1,000 people and imagine that each person answers correctly 51% of the time (barely better than just guessing.)

From all the answers, pick the most common one.

How likely are you to end up with the correct answer?

If you ask 1,000 people, you'll get the correct answer 75%* of the time.

If you ask 10,000 people: ~97%* of the time!

Isn't this surprising?

(* These values assume each person answers independently from the rest.)

But why does this happen?

Since each person answers correctly 51% of the time, the more people we ask, the easier it is to get a correct answer.

1,000 people =

• ~510 correct answers
• ~490 wrong answers

More people will make it more likely to get the right answer.

The same happens when using individual models:

If we put several of them together and have them vote, the likelihood of getting the correct answer increases.

This is really cool!

We can use this to improve our predictions when solving a problem.

A little bit of jargon:

• A model that performs barely better than random guessing is called a "weak learner."

• A model that achieves high accuracy is called a "strong learner."

In our people example, each person would be a weak learner (they perform at 51% accuracy.)

Theoretically, we can take many weak learners, put them together in an ensemble, and create a strong learner.

It won't be easy, though: we need every model to be as independent as possible from all the other models.

This is hard to achieve in practice.

Something important to keep in mind:

As long as you train all of your models on the same data, they won't be truly independent of one another.

That being said, ensembles are still very powerful in practice.

Let's see one real-life example.

One of the latest problems I've worked on: Classifying an image into a few hundred classes.

We built Model A: A pre-trained ResNet50 architecture with a few layers added on top.

Performance: ~87% accuracy.

This is good. But not great.

We built a second model.

Model B: A one-shot classification (Siamese using Triplet-Loss).

Performance: ~88% accuracy.

This is also good. But not great either.

How about building an ensemble with both models?

Performance: ~94% accuracy.

Holy shit!

We went from 87% and 88% accuracy on the individual models to 94% when combining them.

That's the power of an ensemble.

Practical things you should keep in mind:

• Models that are too similar will make the same mistakes. This will hurt the performance of the ensemble.

• Try to use different algorithms to build each model. This will increase their independence and produce better results.

• Bagging is a great choice: Use the same algorithm for every model but train on different data.

• Look into Boosting as well (AdaBoost is a great place to start.)

• You can get creative on how your ensemble votes (especially when using an even number of models.)

@svpino

TL;DR:

• To get better predictions, you can build an Ensemble.

Every week, I post 2 - 3 threads like this, breaking down machine learning concepts and giving you ideas on applying them in real-life situations.

Follow me @svpino, and don't miss the fun!

https://twitter.com/surdoiut/status/1402967214358204418

There’s absolutely no reason to only focus on Decision Trees when building an ensemble.

You can definitely mix and match different models and algorithms.

https://twitter.com/surdoiut/status/1402967214358204418

https://twitter.com/hammadxhammad/status/1402972193248534529?s=20

This is a great question.

Ensembles don't necessarily perform better.

For example, if you have biased models, using Bagging will not lead to good results. If your models show high variance instead, Boosting will not work well.

Tip: Always try.

https://twitter.com/hammadxhammad/status/1402972193248534529?s=20

https://twitter.com/RasmusToivanen/status/1402972415123144705?s=20

Great question!

I think the only way to know would be to experiment and see whether adding TTA to the mix leads to better predictions.

In principle, I don't see any problems combining TTA with an ensemble. I'd go for it.

https://twitter.com/RasmusToivanen/status/1402972415123144705?s=20

https://twitter.com/arnaudai/status/1402995304123670547?s=20

Memory footprint might not be a constraint.

It's about tradeoffs. Maintaining 2 models is much more difficult than maintaining one (usually by more than 2x factor.)

Sometimes, an ensemble is worth every penny.

The example on the thread is a real one.

https://twitter.com/arnaudai/status/1402995304123670547?s=20

https://twitter.com/Thevelopher/status/1403014034862051329?s=20

Yes, regression tasks are no exception. You can also get better results using ensembles.

Keep in mind, however, that ensembles aren't guaranteed to work. It depends on a lot of factors related to your data, the model algorithms, etc.

https://twitter.com/Thevelopher/status/1403014034862051329?s=20

https://twitter.com/benergetic/status/1403029005188558850

Here is one idea that is not guaranteed to work: model distillation.

Can you distill the knowledge of multiple models into a single one that you can run with a much lower budget?

https://twitter.com/benergetic/status/1403029005188558850

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Read 14 tweets

Santiago

@svpino

6 Jun

Good systems produce outstanding results.

↓ Some of my recommendations:

• Improve as a developer
• Improve your communication
• Take a course. Take another. Repeat.
• Solve problems. Many of them.
• Teach others.
• Analysis first. Code is secondary.
• Stay curious.

https://twitter.com/technophille/status/1401504668467875842

“Tutorial hell” is only when you focus on consumption and neglect production.

Solve problems and put what you learn out there.

https://twitter.com/technophille/status/1401504668467875842

https://twitter.com/elihampton_/status/1401547757547143180?s=20

Curiosity pushes me to dig deeper. An infinite number of "but why?" questions.

There's something new and interesting on every layer you uncover.

And the more you dig, the better your understanding and the greater your capacity to create something new.

https://twitter.com/elihampton_/status/1401547757547143180?s=20

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Like this author's thread?