Santiago Profile picture
18 Sep, 12 tweets, 3 min read
In theory, you can model any function using a neural network with a single hidden layer.

However, deep networks are much more efficient than shallow ones.

Can you explain why?
If my first claim gives you pause, I'm talking about the Universal approximation theorem.

You can find more about it online, but the attached paragraph summarizes the relevant part very well.
Informally, we usually say that we can model any function with a single hidden layer neural network.

But there are a couple of caveats with this statement.
First caveat:

By "model," we mean "approximate."

A single hidden layer neural network might not exactly reproduce any function, but it can approximate it close enough to a point where the differences aren't relevant.
Second caveat:

Notice how the excerpt about the theorem talks about "well-behaved functions." This refers to continuous functions, not discontinuous.
With these two caveats out of the way, we can focus on my second claim:

Deeper networks are much more efficient than shallower networks.

In other words, while all we need is a single hidden layer, in theory, we are usually better off with more layers in practice.
The simplest explanation here is about the ability of deep networks to learn hierarchies of concepts.

With multiple layers, each one can focus on capturing knowledge at a specific level of abstraction and pass it on to the next layer.
Classic example: image classification.

Earlier layers focus on abstract concepts like edges, colors, shadows.

Later layers use those concepts to learn about more concrete representations like specific shapes and complex objects.
A single hidden layer network would have to focus on every pixel individually.

It will probably need a ridiculous amount of neurons to extract the same knowledge that we can easily do with a deep network.

And even then, we have no idea how to make it happen.
In addition to this, we can also talk about how much more flexibility we can afford if we aren't limited to a single hidden layer.

@AlejandroPiad talked about this in his answer:
As a recap, I'll quote @michael_nielsen:

"(...) universality tells us that neural networks can compute any function; and empirical evidence suggests that deep networks are the networks best adapted to learn the functions useful in solving many real-world problems."
For an excellent (visual) explanation about this, check out Chapter 4 of the "Neural Networks and Deep Learning" book.

It's free and online!

neuralnetworksanddeeplearning.com/chap4.html

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Santiago

Santiago Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @svpino

17 Sep
I need your help.

The doctor tested me, and I came back positive for a disease that infects 1 of every 1,000 people.

The test comes back positive 99% of the time if the person has the disease. About 2% of uninfected patients also come back positive.

Do I have the disease?
To answer this question, we need to understand why the doctor tested me in the first place.

If I had symptoms or if she suspected I had the disease for any reason, the analysis would be different.

But let's assume that the doctor tested 10,000 patients for no specific reason.
Many people replied using Bayes Theorem to solve this problem.

This is correct. But let's try to come up with an answer in a different—maybe more intuitive—way.
Read 8 tweets
16 Sep
Next Friday, 50 tickets. I’ll help you getting started.

twitter.com/i/spaces/1Yqxo…
This is my first ticketed space. Only 50 people will participate, so I can make sure everyone gets their money's worth.

I don't think tickets will be on sale for too long, so don't wait if you want to participate.
I just found out that this is only supported in iOS. To buy a ticket ($2.99) you need to be on iOS.
Read 4 tweets
15 Sep
Imagine I tell you this:

"The probability of a particular event happening is zero."

Contrary to what you may think, this doesn't mean that this event is impossible. In other words, events with 0 probability could still happen!

This seems contradictory. What's going on here?
Yesterday, I asked the question in the attached image.

Hundreds of people replied. Many of the answers followed the same logic:

"The probability can't be zero because that would mean that the event can't happen."

This, however, is not true.
Let's start with something that we know:

Impossible outcomes always have a probability of 0.

This means that the probability of an event that can't happen is always zero.

Makes sense. But the opposite is not necessarily true!
Read 12 tweets
14 Sep
It was a different morning.

People woke up that day to an astonishing New York Times article: "New Navy Device Learns By Doing."

It was July of 1958, and for the first time, an electronic device showed the ability to learn.

It was called "Perceptron."
Frank Rosenblatt was born in New York and spent most of his life as a research psychologist.

Sleepless years of research culminated in his best-known work, which shocked the world and was billed as a revolution.

His machine, designed for image recognition, was able to learn!
Frank's ideas were the center of controversy among the AI community.

The New York Times reported about the machine:

"[the Navy] expects will be able to walk, talk, see, write, reproduce, and be conscious of its existence.

Bold claims at that time!
Read 7 tweets
9 Sep
Antoine was born in France back in 1607.

Despite not being a nobleman, he called himself "Chevalier De Méré," and spent his days as any other writer and philosopher at the time.

But the Chevalier liked gambling, and was obsessed with the probabilities surrounding the game.
One day he started losing money unexpectedly.

His choices were between:

1. Getting at least one six with four throws of a die, or

2. Getting at least one double six with 24 throws of a pair of dice?

He believed both had equal probabilities, but luck kept eluding him. 🤦
This is how Méré thought about this problem:

1. Chance of getting one six in one roll: 1/6

2. Average number in four rolls: 4(1/6) = 2/3

3. Chance of getting double six in one roll: 1/36

4. Average number in 24 rolls: 24(1/36) = 2/3

Then, why was he losing money?
Read 10 tweets
7 Sep
If you want to become a better gambler, you need to learn probabilities.

(Also useful for machine learning, but who cares about that.)

Let's talk about the basic principles of probabilities that you need to understand.
This is what we are going to cover:

The four fundamental rules of probabilities and a couple of basic concepts.

These will help you look at the world in a completely different way.

(And become a better gambler, if that's what you choose to do.)
Let's start with an example:

If you throw a die, you'll get six possible elementary outcomes.

We call the collection of possible outcomes "Sample Space."

The attached image shows the sample space of throwing a single die.
Read 34 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!

:(