Santiago Profile picture
1 Apr, 8 tweets, 2 min read
One way to reduce overfitting is by automatically augmenting your data.

Think about this: if you had an infinite number of samples, you would never overfit because your model would see every possibility out there.

↓ 1/7
Data augmentation is a way to generate more data using an existing dataset.

For example, by applying small transformations to existing images, you can generate many useful variations.

2/7
Here are some examples of possible variations that you could generate for an image:

▫️ Zoomed-in
▫️ Randomly cropped
▫️ Horizontally shifted
▫️ Horizontally flipped
▫️ Slightly rotated
▫️ More illuminated

3/7
Assuming we have a dataset with 1,000 images, if we augment it with 6 variations for every image, we will end up with 6,000 images.

This is quite an improvement!

4/7
Of course, size doesn't matter. Quality does.

We need to ensure that the data we generate is valid and exposes the model to a realistic distribution of real samples.

5/7
One thing to keep in mind is that data augmentation is useful to reduce overfitting, but it may not be enough.

It really doesn't matter how many images you auto-generate if all of them are just a remix of the existing data.

6/7
If we could generate completely different, real-looking images for any dataset out there, things will get really interesting.

We aren't there yet, but if you follow me, I'll let you know as soon as we make it 😎.

7/7
You can use GANs for some use cases, But they aren't still a good general solution.

For example, generating real-looking photos of any object in random locations is far from an easy problem to solve.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

2 Apr
When we start with machine learning, we learn to split our datasets in testing and training by taking a percentage of the data.

Unfortunately, this practice could lead to overestimating the performance of your model.

1/7
Imagine a dataset of pictures with people doing signals with their hands.

As we were told, we take 70% of the images for training and the remaining 30% for testing. We are careful to maintain the original ratio between classes.

How could this be a problem?

2/7
There are a lot of pictures of Mary in the dataset. She is showing different signals with her hands.

Also Joe. He was a model too that participated in the creation of the dataset.

3/7
Read 10 tweets
1 Apr
Pick one of these two.

They will both help you write better Python. Image
Both of these are great books to open from time to time and read an individual section.

They give you bite-sized tips and advice that you can incorporate immediately into your work.

Replace 30 minutes of Netflix every week with some reading.

Read 4 tweets
31 Mar
Coming soon, in Python 🐍 3.10: "Pattern Matching."

Looks sick!
No, this is not a switch statement. Pattern matching is very different.

With patterns, you get a small language to describe the structure of the values you want to match. Look at one of the examples to see how you can match an element of a tuple.
You can use patterns to match even more complex structures. You can nest them. You can have redundancy checking.

Pattern matching is a feature you can find in functional languages.

It's excellent that Python decided to add it! I'm really excited about it.
Read 4 tweets
30 Mar
We always focus on Python 🐍, math, and machine learning theory when starting out, but that's not all of it.

Fundamentals of Computer Science help tremendously.

Here are 6 topics that will benefit you as a machine learning practitioner: 🧵👇
1. Algorithm analysis

You should be able to compare the efficiency of different algorithms without having to implement them.

2/8
2. Basic data structures

Understanding the different tradeoffs and performance implications of basic data structures is fundamental.

3/8
Read 8 tweets
29 Mar
I've talked about Transfer Learning before.

In summary: you can reuse the knowledge from a different model to kick-start your new model.

Practically, this is how I make transfer learning happen: 🧵👇
First, I pick the model architecture I'll be transferring from.

There are hundreds of pre-trained models for TensorFlow (Check TensorFlow Hub.)

I spend most of my time working with images, and my go-to is usually ResNet with ImageNet weights.

↓ 2/10
I instantiate the model without its top layer and load the pre-trained weights into it.

To make sure I don't destroy those weights during training, I freeze those weights. Frozen weights won't change.

↓ 3/10
Read 10 tweets
27 Mar
$5 for the next 2 hours. Back to $15 after that.
$0 if you don't like it.

If you don't want it but still want to support my work, like/retweet this message. Thanks!

gumroad.com/l/kBjbC/rfgnxf4
Thanks for the support, everyone!

This worked.

1 more hour to go.
10 more copies and price goes back.
Read 8 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!