Tweet

Santiago

Follow @svpino

Feb 23 • 14 tweets • 4 min read

One of the most popular activation functions used in deep learning models is ReLU.

I asked: "Is ReLU continuous and differentiable?"

Surprisingly, a lot of people were confused about this.

Let's break this down step by step: ↓

Let's start by defining ReLU:

f(x) = max(0, x)

In English: if x <= 0, the function will return 0. Otherwise, the function will return x.

If you draw this function, you'll get the attached chart.

Notice there are no discontinuities in the function.

This should be enough to answer half of the original question: the ReLU function is continuous.

Let's now think about the differentiable part.

A necessary condition for a function to be differentiable: it must be continuous.

ReLU is continuous. That's good, but not enough.

Its derivative should also exist for every individual point.

Here is where things get interesting.

We can compute the derivative of a function using the attached formula.

(I'm not going to explain where this is coming from; you can trust me on this one.)

We can use this formula to see whether ReLU is differentiable.

Looking at ReLU's chart again, the interesting point is when x = 0.

That's where the function changes abruptly.

If there's going to be an issue with the function's derivative, it's going to be there!

Here is the deal:

For ReLU to be differentiable, its derivative should exist at x = 0 (our problematic point.)

To see whether the derivate exists, we need to check that the left-hand and right-hand limits exist and are equal at x = 0.

That shouldn't be hard to do.

Going back to our formula:

The first step is to replace f(x) with ReLU's actual function.

It should now look like this:

So let's find out the left-hand limit.

In English: we want to compute the derivative using our formula when h approaches zero from the left.

At x = 0 and h < 0, we end up with the derivative being 0.

We can now do the same to compute the right-hand limit.

In this case, we want h to approach 0 from the right.

At x = 0, and h > 0, we will end up with the derivative being 1.

Awesome! This is what we have:

1. The left-hand limit is 0.
2. The right-hand limit is 1.

For the function's derivative to exist at x = 0, both the left-hand and right-hand limits should be the same.

This is not the case. The derivative of ReLU doesn't exist at x = 0.

We now have the complete answer:

• ReLU is continuous
• ReLU is differentiable

But here was the central confusion point:

How come ReLU is not differentiable, but we can use it as an activation function when using Gradient Descent?

This was the reason many people thought ReLU was differentiable.

What happens is that we don't care that the derivative of ReLU is not defined when x = 0. When this happens, we set the derivative to 0 (or any arbitrary value) and move on with our lives.

A nice hack.

In deep learning, it's rare for x to be precisely zero. We can get away with our hack and not worry too much about it.

This is the reason we can still use ReLU together with Gradient Descent.

Isn't math beautiful?

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @svpino

Santiago

@svpino

Feb 22

Do you want to take your machine learning skills to a new level?

Read this very carefully:

This is an opportunity for anyone who can't wait to apply machine learning to real-world challenges.

The best part: It's 100% free!

Read on for the details: ↓

pischool.link/AI10

@picampusschool

The School of Artificial Intelligence @picampusschool starts its hands-on mentoring program on March 14.

8 weeks where you'll be working on real industry challenges!

You'll learn by doing as part of a team, and you'll have a mentor!

Honestly, it doesn't get better than this.

I want you to apply right now (you have absolutely nothing to lose!)

If you get approved, you'll get a full ride, 100% free, and 8 weeks later, your life will not be the same.

Go to this link, and send your application right away!

pischool.link/AI10

Read 7 tweets

Santiago

@svpino

Feb 21

The anatomy of ReLU.

Check your answer in the next tweet.

Answer here

I'm surprised this is shaping up the way it is.

Still, plenty of time left, but I would have expected the correct answer to pull ahead by now.

Read 7 tweets

Santiago

@svpino

Feb 21

Every recommendation to start with machine learning focuses on a few building blocks:

• Linear Algebra
• Calculus
• Statistics and probabilities
• Fundamentals of machine learning

But there's also a different way.

Let's talk about it: ↓

Many companies are dying to start applying machine learning to their businesses.

Believe me: I talk to many of them every week.

Their main problem: they don't know where to start or how to get it done.

If you are reading this, you are probably part of one of these companies.

Heck, for all I know, most employees out there work for a company that's in this situation!

The demand for machine learning professionals is enormous!

Read 27 tweets

Santiago

@svpino

Feb 19

Learning about containers will open many doors for you.

If you are a Machine Learning Engineer, containerization is a must.

For the most part, "deploying machine learning" has a lot to do with containers.

There are a couple of ways you can approach this:

Understanding how containers work, building blocks, the standardized API, etc.

Or, you can start with Docker, find a few examples of how to use it, and progress from there.

I've been deploying things inside Docker containers for years now.

I'm sure I can't explain most of the things happening behind the scenes.

Anyone with that knowledge is in a much better position, for sure, but that doesn't mean that I can't get my work done correctly.

Read 4 tweets

Santiago

@svpino

Feb 18

Applying dimensionality reduction

What’s your answer?

Based on the answers so far, this one seems to be easy.

Read 4 tweets

Santiago

@svpino

Feb 17

This is how I split machine learning projects:

1. Project scoping
2. Data definition and preparation
3. Model training and error analysis
4. Deployment, monitoring, and maintenance

Here are 33 questions that most people forget to ask.

"Project scoping":

• What problem are we trying to solve?
• Why does it need to be solved?
• Do we truly need machine learning for this?
• What constraints do we have?
• What are the risks?
• What's the best approach to solving this?
• How do we measure progress?

Still under "Project scoping":

• What does success look like?
• How is our solution going to impact people?
• What could go wrong with our solution?
• What's the simplest version we could build?

Read 8 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Santiago

Try unrolling a thread yourself!

More from @svpino

Santiago

Santiago

Santiago

Santiago

Santiago

Santiago

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Like this author's thread?