Santiago Profile picture
Mar 9, 2021 16 tweets 5 min read Read on X
Here is an underrated machine learning technique that will give you important information about your data and model.

Let's talk about learning curves.

Grab your ☕️ and let's do this thing!

🧵👇
Start by creating a model. Something simple. You are still exploring what works and what doesn't, so don't get fancy yet.
We are now going to plot the loss (model error) vs. the training dataset size. This will help us answer the following questions:

▫️ Do we need more data?
▫️ Do we have a bias problem?
▫️ Do we have a variance problem?
▫️ What's the ideal picture?
▫️ Do we need more data?

As you increase the training size, if both curves converge towards each other and stop improving, you don't need more data.

If there's room for them to continue closing the gap, then more data should help. Image
This one should be self-explanatory: if our errors stopped improving after adding more data, it's unlikely that more of it will do any good.

But if we still see the loss improving, more data should help push it even lower.
▫️ Do we have a bias problem?

If the training error is too high, we have a high bias problem.

Also, if the validation error is too high, we have a problem with the bias —either low or high bias. Image
A high bias indicates that our model is not powerful enough to learn the data. This is why our training error is high.

If the training error is low, that's a good thing: our model can fit the data.
High validation error indicates that our model is not performing well on the validation data. We probably have a bias problem.

To know in which direction, we need to look at the training error to decide.

▫️ Low training error: low bias
▫️ High training error: high bias
▫️ Do we have a variance problem?

If there's a big gap between the training error and the validation error, we have high variance.

A low training error also indicates that we have high variance. Image
High variance indicates that the model fits the data too well (probably memorizing it.)

When testing with the validation set, we should see the big gap indicating that the model did great with the training set, but sucked with the validation set.
A couple more important points:

▫️ High bias + low variance: we are underfitting.
▫️ High variance + low bias: we are overfitting.
▫️ What's the ideal picture?

These are the curves that you should be looking forward to getting.

Training and validation error converged both to a low error. Image
Here is another chart that does an excellent job at explaining bias and variance.

You want low bias + low variance, but keep in mind there's always a tradeoff between them: you need to find a good enough balance for your specific use case. Image
If these threads help, then make sure to follow me, and you won't be disappointed.

And for even more in-depth machine learning stories, make sure you head over digest.underfitted.io. The first issue coming this Friday!

🐍 Image
Here is a quick guide that will help you dealing with overfitting and underfitting:



Both error or score will work to create good learning curves.

They will simply work as opposites: You always want to maximize score and minimize error.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

Jun 10
There's a stunning, simple explanation behind matrix multiplication.

This is the first time this clicked on my brain, and it will be the best thing you read all week.

Here is a breakdown of the most crucial idea behind modern machine learning:

1/15 Image
This explanation is courtesy of @TivadarDanka. He allowed me to republish it

3 years ago, he started writing a book about the mathematics of Machine Learning.

It's the best book you'll ever read:



Nobody explains complex ideas like he does.

2/15tivadardanka.com/books/mathemat…
Let's start with the raw definition of the product of A and B.

This looks horrible and complicated.

Let's unwrap it step by step.

3/15 Image
Read 15 tweets
May 28
This assistant has 169 lines of code:

• Gemini Flash
• OpenAI Whisper
• OpenAI TTS API
• OpenCV

GPT-4o is slower than Flash, more expensive, chatty, and very stubborn (it doesn't like to stick to my prompts).

Next week, I'll post a step-by-step video on how to build this.
The first request takes longer (warming up), but things work faster from that point.

Few opportunities to improve this:

1. Stream answers from the model (instead of waiting for the full answer.)

2. Add the ability to interrupt the assistant.

3. Whisper running on GPU
Unfortunately, no local modal supports text+images (as far as I know,) so I'm stuck running online models.

The TTS API (synthesizing text to audio) can also be replaced by a local version. I tried, but the available voices suck (too robotic), so I kept OpenAI's.
Read 4 tweets
May 25
I’m so sorry about anyone who bought the rabbit r1.

It’s not just that the product is non-functional (as we learned from all the reviews), the real problem is that the whole thing seems to be a lie.

None of what they pitched exists or functions the way they said. Image
They sold the world on a Large Action Model (LAM), an intelligent AI model that would understand applications and execute the actions requested by the user.

In reality, they are using Playwright, a web automation tool.

No AI. Just dumb, click-around, hard-coded scripts. Image
Their foundational AI model is just ChatGPT + scripts.

Rabbit’s founder lied on their marketing videos, during interviews, when he presented the product, and lied on Discord when answering questions from early supporters.

And that’s just the beginning:
Read 4 tweets
Mar 31
What a week, huh?

1. Mojo 🔥 went open-source
2. Claude 3 beats GPT-4
3. $100B supercomputer from MSFT and OpenAI
4. Andrew Ng and Harrison Chase discussed AI Agents
5. Karpathy talked about the future of AI
...

And more.

Here is everything that will keep you up at night:
Mojo 🔥, the programming language that turns Python into a beast, went open-source.

This is a huge step and great news for the Python and AI communities!

With Mojo 🔥 you can write Python code or scale all the way down to metal code. It's fast!

modular.com/blog/the-next-…
Claude 3 is the best model in the market right now, overtaking GPT-4.

Claude 3 Opus is #1 in the Arena Leaderboard (beating GPT-4.)

Opus is a huge model, but Claude 3 Haiku is cheap and fast. And it's also beating GPT-4 0613!

Read 10 tweets
Mar 13
The batch size is one of the most important parameters when training neural networks.

Here is everything you need to know about the batch size:

1 of 14 Image
I trained two neural networks.

Same architecture, loss, optimizer, learning rate, momentum, epochs, and training data. Almost everything is the same.

Here is a plot of their losses.

Can you guess what the only difference is?

2 of 14 Image
It's the batch size.

The first network uses batch_size = 1.

The loss is noisy and takes a long time to train.

And every time I run it, I get completely different results. It keeps jumping around and never settles on a good solution.

3 of 14 Image
Read 14 tweets
Jan 5
I had an amazing machine learning professor.

The first thing I learned from him was how to interpret learning curves. (Probably one of the best skills I built and refined over the years.)

Let me show you 4 pictures and you'll see how this process flows:

1/5 Image
I trained a neural network. A simple one.

I plotted the model's training loss. As you can see, it's too high.

This network is underfitting. It's not learning.

I need to make the model larger.

2/5 Image
I increased the capacity of the model. The training loss is now low.

The model is not underfitting anymore, but it might be memorizing the data. I don't know yet.

I need to evaluate this model.

3/5 Image
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(