10 Jun, 22 tweets, 5 min read
Machine learning superpower: Be the one that makes better predictions.

I can teach you how to do this by putting 2+ models together.

• What are they?
• Why do they work?
• Real-life examples.
• Practical tips.

A group of models working together is called an "ensemble."

Instead of using a single model, you could build 2 different models and have them vote to select the best answer.

You could also build 3, 4, or however many models you want.

This is powerful.
Here is the surprising part:

Ensembles usually perform better than all of their individual models.

Let's look at one example.
Ask a Yes/No question to 1,000 people and imagine that each person answers correctly 51% of the time (barely better than just guessing.)

From all the answers, pick the most common one.

How likely are you to end up with the correct answer?
If you ask 1,000 people, you'll get the correct answer 75%* of the time.

If you ask 10,000 people: ~97%* of the time!

Isn't this surprising?

(* These values assume each person answers independently from the rest.)
But why does this happen?

Since each person answers correctly 51% of the time, the more people we ask, the easier it is to get a correct answer.

1,000 people =

More people will make it more likely to get the right answer.
The same happens when using individual models:

If we put several of them together and have them vote, the likelihood of getting the correct answer increases.

This is really cool!

We can use this to improve our predictions when solving a problem.
A little bit of jargon:

• A model that performs barely better than random guessing is called a "weak learner."

• A model that achieves high accuracy is called a "strong learner."

In our people example, each person would be a weak learner (they perform at 51% accuracy.)
Theoretically, we can take many weak learners, put them together in an ensemble, and create a strong learner.

It won't be easy, though: we need every model to be as independent as possible from all the other models.

This is hard to achieve in practice.
Something important to keep in mind:

As long as you train all of your models on the same data, they won't be truly independent of one another.

That being said, ensembles are still very powerful in practice.

Let's see one real-life example.
One of the latest problems I've worked on: Classifying an image into a few hundred classes.

We built Model A: A pre-trained ResNet50 architecture with a few layers added on top.

Performance: ~87% accuracy.

This is good. But not great.
We built a second model.

Model B: A one-shot classification (Siamese using Triplet-Loss).

Performance: ~88% accuracy.

This is also good. But not great either.
How about building an ensemble with both models?

Performance: ~94% accuracy.

Holy shit!

We went from 87% and 88% accuracy on the individual models to 94% when combining them.

That's the power of an ensemble.
Practical things you should keep in mind:

• Models that are too similar will make the same mistakes. This will hurt the performance of the ensemble.

• Try to use different algorithms to build each model. This will increase their independence and produce better results.
• Bagging is a great choice: Use the same algorithm for every model but train on different data.

• Look into Boosting as well (AdaBoost is a great place to start.)

• You can get creative on how your ensemble votes (especially when using an even number of models.)
TL;DR:

• To get better predictions, you can build an Ensemble.

Every week, I post 2 - 3 threads like this, breaking down machine learning concepts and giving you ideas on applying them in real-life situations.

Follow me @svpino, and don't miss the fun!
There’s absolutely no reason to only focus on Decision Trees when building an ensemble.

You can definitely mix and match different models and algorithms.
This is a great question.

Ensembles don't necessarily perform better.

For example, if you have biased models, using Bagging will not lead to good results. If your models show high variance instead, Boosting will not work well.

Tip: Always try.

Great question!

I think the only way to know would be to experiment and see whether adding TTA to the mix leads to better predictions.

In principle, I don't see any problems combining TTA with an ensemble. I'd go for it.

Memory footprint might not be a constraint.

It's about tradeoffs. Maintaining 2 models is much more difficult than maintaining one (usually by more than 2x factor.)

Sometimes, an ensemble is worth every penny.

The example on the thread is a real one.

Yes, regression tasks are no exception. You can also get better results using ensembles.

Keep in mind, however, that ensembles aren't guaranteed to work. It depends on a lot of factors related to your data, the model algorithms, etc.

Here is one idea that is not guaranteed to work: model distillation.

Can you distill the knowledge of multiple models into a single one that you can run with a much lower budget?

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

# More from @svpino

11 Jun
Software developers want to get into machine learning.

Many make the same mistakes. I've seen a few, and I have some ideas on how to avoid them.

This is what I've learned ↓
Lesson 1

Most people love the idea of starting something new. Only a few take the first step.

Preparing for something new is fun and exciting. It can also turn into glorified procrastination.

Stop collecting resources. Take what's right in front of you and run with it.
Lesson 2

Learning is a marathon, not a sprint. Strap yourself for a long, lifelong road.

If you are looking to make a quick buck, look elsewhere. If you are looking for shortcuts, this ain't it.

Make sure you come for the right reasons and are willing to go the distance.
9 Jun
There are thousands of machine learning algorithms out there, but that's mostly noise.

You'll rarely need more than a handful.

A good start:

• Linear/Logistic Regression
• Decision Trees
• Neural Networks
• XGBoost
• Naive Bayes
• PCA
• KNN
• SVM
• t-SNE

No specific reason. Your list will certainly include algorithms that I haven't even heard about.

But you must start somewhere, and these are certainly a good foundation.
In case you are curious, I also studied the following algorithms in the first couple of years of starting with machine learning:

• Random Forest
• K-Means
• Expectation Maximization
• Simulated Annealing
• Genetic Algorithms
8 Jun
Last week, an Italian artist sold an invisible sculpture for \$18,300.

That's a lot of money for a lot of nothing!

I have a better idea for your money:

For the next 24 hours, you can grab my course for a massive 60% discount!

\$6 only. That's the price of a cup of coffee where I live!

gum.co/kBjbC/60off

(Almost 2,000 customers with 208 reviews.)

If you still can't afford this, let me know, and I'll send you a free copy.
14 more hours, and the price goes back to \$15.
8 Jun
The 4 stages of a machine learning project lifecycle:

1. Project scoping
2. Data definition and preparation
3. Model training and error analysis
4. Deployment, monitoring, and maintenance

Here are 29 questions that you can use at each step of the process.

Project scoping

• What problem are we trying to solve?
• Why do we need to solve this problem?
• What are the constraints?
• What are the risks?
• What's the best approach to solving it?
• How do we measure progress?
• What does success look like?
Data definition and preparation

• What data do we need?
• How are we going to get it?
• How frequently does it change?
• Do we trust the source?
• How is this data biased?
• Can we improve it somehow?
• How are we going to clean it?
• How are we going to augment it?
7 Jun
Here is a photo from the newspaper of a communist island.

I'm the one standing. This was 20 years ago.

I've been developing software for 25+ years, and I've learned a few things.

I didn't have Internet back then, but now that I do, I can share 3 lessons with you:

Look at that photo again.

This was early 2000.

Those were the best computers Cuba had to offer to our Computer Science faculty. Outdated but good enough.

In a country where owning a personal computer was a crime, it was all we had.
One thing was missing: There was no Internet.

I know this might be hard to understand, so I'll rephrase:

We were going through our Computer Science bachelor's with no Internet access.

The entire wealth of information we had fit in a couple of books.
6 Jun
Good systems produce outstanding results.

↓ Some of my recommendations:

• Improve as a developer