Santiago Profile picture
22 Mar, 6 tweets, 2 min read
Do you wanna know why do we use ReLU when doing deep learning?

When starting out with neural networks, it's common to work with examples using the sigmoid activation function.

The sigmoid function squeezes any input value to a value between 0 and 1.

This is a 🧵👇
There's a problem with the sigmoid function: it saturates quickly.

This means that smaller and larger values will get concentrated around 0 and 1, respectively.

The function is only sensitive to values around the midpoint.

(2 of 5)
Once saturated, the weights will stop changing, and the network will not learn anything useful.

If your network is not too deep, this will not be an issue. But if you have a buttload of layers, you'll likely run into the problem.

This sucks.

(3 of 5)
ReLU doesn't have the same issue.

ReLU turns any negative value into 0 and leaves the rest untouched.

It has some cool properties:

▫️ Easy to implement (Just a max)
▫️ Linear properties (for values greater than 0.0)
▫️ No vanishing gradient problem

(4 of 5)
With ReLU, we get the benefits of linear functions and non-linearities (for values under 0.0) to learn interesting stuff.

ReLU is one of those things that made deep learning possible.

(5 of 5)
In practice, you can control the dying ReLU problem by carefully selecting the appropriate learning rate.

Basically, set the learning rate too high, and half of your network may die. But with the appropriate value, dying ReLU will not be a problem.

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

24 Mar
🐍 Python 3 features that you might not be using yet:

▫️ Type hints
▫️ Data classes
▫️ Pathlib
▫️ Enumerations
▫️ F-strings
▫️ Iterable unpacking
▫️ Walrus operator
▫️ Async IO
▫️ Assignment expressions
▫️ Positional-only parameters

Pick one and see how it can help you.
I like to spend some time every week looking into something new from Python 🐍.

2 out of 3 times, I can't use it right away. I don't find a good way to make it work for me.

I usually talk about what I learned here on Twitter and then put it in the backburner.
Sometimes, I find a good place right away for what I just learned, and there's no better feeling than that!

I think people need more Python 🐍 in their lives:

- Simple
- Popular
- Powerful
- Versatile

Follow me and I'll make sure we learn this thing together until it hurts.
Read 4 tweets
19 Mar
Thoughts about starting with machine learning.

☕️🧵👇
Three unnegotiable prerequisites:

1. Software development
2. Algorithms and data structures
3. Communication

If you build a strong foundation on these, you'll be unstoppable.
Learn to build software.

It's hard to make progress with machine learning if you struggle with programming.
Read 23 tweets
18 Mar
I can't shut up about neural networks.

What questions do you have?
They aren't necessarily opposite concepts.

Fully connected refer to networks composed of layers where every node is connected to every node of the next layer.

Deep networks refer to networks with many layers. They could be fully connected or not.

Especially with deep learning, where you have many layers full of nodes, it's hard to understand the "thinking" of a network because you'll have to reverse-engineer million of float values and try to make sense of them.

Hard to do.

Read 13 tweets
18 Mar
The ability to reuse the knowledge of one model and adapt it to solve a different problem is one of the most consequential breakthroughs in machine learning.

Grab your ☕️ and let's talk about this.

🧵👇
A deep learning model is like a Lego set, with many pieces connected, forming a long structure.

These pieces are layers, and each layer has a responsibility.
Although we don't know exactly the role of every layer, we know that the closer they get to the output, the more specific they get.

The best way to understand what I mean is through an example: a model that will process car images.
Read 13 tweets
17 Mar
Here are some of the features that make Python 🐍 a freaking cool language.

🧵👇
1. You can slice and dice arrays very easily.
2. What's even better: negative indexing is really cool, and you can use it to refer to items from the last element in the list.
Read 15 tweets
16 Mar
5 Python 🐍 package managers that I'm not using anymore:

▫️ conda
▫️ virtualenv
▫️ venv
▫️ pipenv
▫️ poetry

🤷‍♂️

Instead, for several weeks now, I've been using development containers in Visual Studio Code.

Life-changing. Give 'em a try.
Here is a thread I wrote a few weeks back when I started using them:
An important note: here I’m referring to the “virtual environment” capabilities of these tools. I still need to pip modules down.

But I’ve been isolating environments with the containers instead.
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!