Tweet

Santiago

Follow @svpino

3 Sep, 15 tweets, 5 min read

Machine learning models can be extremely powerful.

But there's a catch: they are notoriously hard to optimize for any given problem. There are just too many variables that we could change.

Thread: On keeping your sanity when training a model.

I'm sure you've heard about "hyperparameters."

Think of this as "configuration settings."

Depending on the settings you choose, your model will perform differently.

Sometimes better. Sometimes worse.

Here are some of the settings that we could change when building a model:

• learning rate
• batch size
• epochs
• optimizer
• regularization

The list goes on and on.

There's usually a combination of values for these hyperparameters that's ideal for your problem.

But finding those values is a headache. There are too many possibilities!

We need something better.

Unfortunately, I've seen many people testing different parameters by hand.

This is tedious, suboptimal, and will rarely lead to good results.

We should automate this process.

I put together a couple of examples for you.

The first example uses KerasTuner to train a neural network on the Penguins dataset.

Here is the notebook: deepnote.com/@svpino/Keras-…

The second example uses Optuna and shows how to optimize an XGBRegressor and a CatBoostRegressor model.

Here is the notebook: deepnote.com/@svpino/Tuning…

(This is part of the work I had to do during my first Kaggle competition.)

@DeepnoteHQ

By the way, if you aren't using @DeepnoteHQ already, you should definitely check it out!

You can create a free account and give your code some VIP treatment.

You can put together a collection of published notebooks that look beautiful.

If you are using TensorFlow and Keras, KerasTuner is the way to go.

In this example, I'm optimizing 3 hyperparameters:

• learning rate
• first hidden layer's units
• second hidden layer's units

Here is the documentation of KerasTuner: keras.io/keras_tuner/

It was only a few weeks ago when I learned about Optuna.

Its API is really simple to understand. The resultant code is clean and organized.

I incorporated Optuna into my toolset and I'm planning to keep using it.

Here is their documentation: optuna.org

Bottom line: Tunning these hyperparameters with these tools is dead easy.

Even better: The process will find a good set for us in a smart way. It doesn't need to try every combination!

This makes the process cleaner, faster, and better than trying by hand.

Let's recap:

• Stop tuning hyperparameters manually.
• Check out KerasTuner if you are using TensorFlow.
• Check out Optuna as well.

Two examples for you:

1. deepnote.com/@svpino/Keras-…

2. deepnote.com/@svpino/Tuning…

@svpino

I post threads like this every Tuesday and Friday.

Follow me @svpino for practical tips and stories about my experience with machine learning.

And if you don’t want to miss any of these threads, subscribe to my newsletter (link in my profile), and I’ll email them to you.

https://twitter.com/ed_rockkk/status/1433759186635001862?s=20

Optuna is independent of specific libraries or frameworks, so you can use it with TensorFlow, PyTorch, or ScikitLearn algorithms.

If you are using Keras, however, I'd recommend you look into KerasTuner instead.

https://twitter.com/ed_rockkk/status/1433759186635001862?s=20

https://twitter.com/panthadeep1/status/1433799328477319174?s=20

Anything that the training process updates, we call "parameters." For example, the weights and biases of a neural network.

Anything that we set, and doesn't change with training, we call "hyperparameter." For example, the learning rate.

https://twitter.com/panthadeep1/status/1433799328477319174?s=20

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svpino

Santiago

@svpino

15 Sep

Imagine I tell you this:

"The probability of a particular event happening is zero."

Contrary to what you may think, this doesn't mean that this event is impossible. In other words, events with 0 probability could still happen!

This seems contradictory. What's going on here?

Yesterday, I asked the question in the attached image.

Hundreds of people replied. Many of the answers followed the same logic:

"The probability can't be zero because that would mean that the event can't happen."

This, however, is not true.

Let's start with something that we know:

Impossible outcomes always have a probability of 0.

This means that the probability of an event that can't happen is always zero.

Makes sense. But the opposite is not necessarily true!

Read 12 tweets

Santiago

@svpino

14 Sep

It was a different morning.

People woke up that day to an astonishing New York Times article: "New Navy Device Learns By Doing."

It was July of 1958, and for the first time, an electronic device showed the ability to learn.

It was called "Perceptron."

Frank Rosenblatt was born in New York and spent most of his life as a research psychologist.

Sleepless years of research culminated in his best-known work, which shocked the world and was billed as a revolution.

His machine, designed for image recognition, was able to learn!

Frank's ideas were the center of controversy among the AI community.

The New York Times reported about the machine:

"[the Navy] expects will be able to walk, talk, see, write, reproduce, and be conscious of its existence.

Bold claims at that time!

Read 7 tweets

Santiago

@svpino

9 Sep

Antoine was born in France back in 1607.

Despite not being a nobleman, he called himself "Chevalier De Méré," and spent his days as any other writer and philosopher at the time.

But the Chevalier liked gambling, and was obsessed with the probabilities surrounding the game.

One day he started losing money unexpectedly.

His choices were between:

1. Getting at least one six with four throws of a die, or

2. Getting at least one double six with 24 throws of a pair of dice?

He believed both had equal probabilities, but luck kept eluding him. 🤦

This is how Méré thought about this problem:

1. Chance of getting one six in one roll: 1/6

2. Average number in four rolls: 4(1/6) = 2/3

3. Chance of getting double six in one roll: 1/36

4. Average number in 24 rolls: 24(1/36) = 2/3

Then, why was he losing money?

Read 10 tweets

Santiago

@svpino

7 Sep

If you want to become a better gambler, you need to learn probabilities.

(Also useful for machine learning, but who cares about that.)

Let's talk about the basic principles of probabilities that you need to understand.

This is what we are going to cover:

The four fundamental rules of probabilities and a couple of basic concepts.

These will help you look at the world in a completely different way.

(And become a better gambler, if that's what you choose to do.)

Let's start with an example:

If you throw a die, you'll get six possible elementary outcomes.

We call the collection of possible outcomes "Sample Space."

The attached image shows the sample space of throwing a single die.

Read 34 tweets

Santiago

@svpino

1 Sep

I finished my first Kaggle competition and scored in the top 4% of participants.

I learned a few valuable lessons. Here they are: ↓

@Kaggle

Most important lesson:

@Kaggle is a firehouse of new knowledge. In around 3 weeks, I learned more than in the last 3 months combined.

It's not only about the competition, but the people and the collaboration.

If you haven't tried yet, consider it.

Kaggle is all about squeezing as much performance out of your solution as you can.

Complexity and runtime are secondary.

This is very different than real-life applications, but it forces you to learn something different and valuable.

Read 13 tweets

Santiago

@svpino

31 Aug

What can you do when your machine learning model stops improving?

There's always a point when you hit the ceiling and the performance of the model stalls.

Thread: A couple of tricks to improve your model.

Here is something that's keeping you from making progress:

You are using all of your data.

It turns out that more data is not always a good thing.

What would it look like to only focus on some of the data? Would that be helpful?

Here is the plan:

1. Find whether there's a portion of the data that's holding you back. Get rid of it.

2. Find whether a portion of the data is better suited for a different model.

Let's break these two apart to understand what to do.

Read 14 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Share this page!

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Like this author's thread?