Tweet

Santiago

Follow @svpino

1 Apr, 8 tweets, 2 min read

One way to reduce overfitting is by automatically augmenting your data.

Think about this: if you had an infinite number of samples, you would never overfit because your model would see every possibility out there.

↓ 1/7

Data augmentation is a way to generate more data using an existing dataset.

For example, by applying small transformations to existing images, you can generate many useful variations.

↓ 2/7

Here are some examples of possible variations that you could generate for an image:

▫️ Zoomed-in
▫️ Randomly cropped
▫️ Horizontally shifted
▫️ Horizontally flipped
▫️ Slightly rotated
▫️ More illuminated

↓ 3/7

Assuming we have a dataset with 1,000 images, if we augment it with 6 variations for every image, we will end up with 6,000 images.

This is quite an improvement!

↓ 4/7

Of course, size doesn't matter. Quality does.

We need to ensure that the data we generate is valid and exposes the model to a realistic distribution of real samples.

↓ 5/7

One thing to keep in mind is that data augmentation is useful to reduce overfitting, but it may not be enough.

It really doesn't matter how many images you auto-generate if all of them are just a remix of the existing data.

↓ 6/7

If we could generate completely different, real-looking images for any dataset out there, things will get really interesting.

We aren't there yet, but if you follow me, I'll let you know as soon as we make it 😎.

7/7

https://twitter.com/ricsinaruto/status/1377626603488960523?s=20

You can use GANs for some use cases, But they aren't still a good general solution.

For example, generating real-looking photos of any object in random locations is far from an easy problem to solve.

https://twitter.com/ricsinaruto/status/1377626603488960523?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svpino

Santiago

@svpino

2 Apr

When we start with machine learning, we learn to split our datasets in testing and training by taking a percentage of the data.

Unfortunately, this practice could lead to overestimating the performance of your model.

↓ 1/7

Imagine a dataset of pictures with people doing signals with their hands.

As we were told, we take 70% of the images for training and the remaining 30% for testing. We are careful to maintain the original ratio between classes.

How could this be a problem?

↓ 2/7

There are a lot of pictures of Mary in the dataset. She is showing different signals with her hands.

Also Joe. He was a model too that participated in the creation of the dataset.

↓ 3/7

Read 10 tweets

Santiago

@svpino

1 Apr

Pick one of these two.

They will both help you write better Python.

https://twitter.com/ricsinaruto/status/1377746166230700034?s=20

Both of these are great books to open from time to time and read an individual section.

They give you bite-sized tips and advice that you can incorporate immediately into your work.

Replace 30 minutes of Netflix every week with some reading.

https://twitter.com/ricsinaruto/status/1377746166230700034?s=20

https://twitter.com/HectorIP/status/1377747317416210434?s=20

There's a lot of overlap.

https://twitter.com/HectorIP/status/1377747317416210434?s=20

Read 4 tweets

Santiago

@svpino

31 Mar

Coming soon, in Python 🐍 3.10: "Pattern Matching."

Looks sick!

No, this is not a switch statement. Pattern matching is very different.

With patterns, you get a small language to describe the structure of the values you want to match. Look at one of the examples to see how you can match an element of a tuple.

You can use patterns to match even more complex structures. You can nest them. You can have redundancy checking.

Pattern matching is a feature you can find in functional languages.

It's excellent that Python decided to add it! I'm really excited about it.

Read 4 tweets

Santiago

@svpino

30 Mar

We always focus on Python 🐍, math, and machine learning theory when starting out, but that's not all of it.

Fundamentals of Computer Science help tremendously.

Here are 6 topics that will benefit you as a machine learning practitioner: 🧵👇

1. Algorithm analysis

You should be able to compare the efficiency of different algorithms without having to implement them.

↓ 2/8

2. Basic data structures

Understanding the different tradeoffs and performance implications of basic data structures is fundamental.

↓ 3/8

Read 8 tweets

Santiago

@svpino

29 Mar

I've talked about Transfer Learning before.

In summary: you can reuse the knowledge from a different model to kick-start your new model.

Practically, this is how I make transfer learning happen: 🧵👇

First, I pick the model architecture I'll be transferring from.

There are hundreds of pre-trained models for TensorFlow (Check TensorFlow Hub.)

I spend most of my time working with images, and my go-to is usually ResNet with ImageNet weights.

↓ 2/10

I instantiate the model without its top layer and load the pre-trained weights into it.

To make sure I don't destroy those weights during training, I freeze those weights. Frozen weights won't change.

↓ 3/10

Read 10 tweets

Santiago

@svpino

27 Mar

$5 for the next 2 hours. Back to $15 after that.
$0 if you don't like it.

If you don't want it but still want to support my work, like/retweet this message. Thanks!

gumroad.com/l/kBjbC/rfgnxf4

Thanks for the support, everyone!

This worked.

1 more hour to go.

10 more copies and price goes back.

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Like this author's thread?