I've been a vocal opponent of the "neural networks are brain simulations" analogy, not because it's *wrong* but because I believe it's harmful for beginners.
I want to propose an alternative analogy for approaching deep learning from a dev background.
👇
Think about detecting a face in an image.
How would you even start to write a program for that?
You know it's gonna have something to do with finding a "nose" and two "eyes", but how can you go from an array of pixels to something that looks like an eye, in whatever position?
Now, suppose you have access to thousands of faces and non-faces.
How does that changes the problem?
Instead of thinking in the problem domain (finding faces) you can now take a leap upwards in abstraction, and think in the meta-problem domain (finding face finders).
How on Earth do you that?
Well, the same way you find anything in computer science.
You have a collection of (potentially infinite) objects, you iterate through them in some smart order, and compare them to some object of reference.
We need two things:
- A way to describe the collection of all potential "face finders", i.e. all possible algorithms that go from pixels to boolean.
- A way to efficiently search in this collection.
Here's where neural networks enter the movie.
We don't know which is the exact program that detects faces, but *if that program exists*, it's gonna have a bunch of IF/ELSEs related to a bunch of pixels.
So we are gonna assume there is some magic program that takes pixels, does a bunch of math with them, and outputs a bool.
Now, we make a meta-program, kind of a template, as complicated as we can, that can be instantiated as many many different specific programs, one of which is hopefully our face detector.
This program is a huge method full of statements like:
"if pixel x_i * w_j > 0 then..."
Now you can see that, if this meta-program is general enough, there should be a way to select suitable values for all w_j that result in this program becoming a face detector.
We just have to search among all ways of assigning values to w_j for the best program.
And the best program is of course the one that produces the smallest error on our example set of faces and non-faces.
Now, instead of using random or exhaustive search, if we play a little bit with the math, we can search much more efficiently (but that's for another day).
Now, instead of actual code, we have a neural network, which is a way to represent these types of programs in a computational structure that makes it much easier to manipulate, store, and analyze.
So, to summarize, think of a neural network as a template program, that according to some specific set of values for its weights, it's gonna be (almost) equivalent to some specific program.
And SGD is just a super optimized search procedure for that specific type of objects.
There are a lot of details I left out, like non-linear activation functions, bias weights, different topologies for connecting these so-called neurons, but none of that matters yet...
This analogy of ANNs as program templates is, I believe, much more helpful for beginners.
• • •
Missing some Tweet in this thread? You can try to
force a refresh
A day to share with you amazing things from every corner of Computer Science.
Today I want to talk about Generative Adversarial Networks 👇
🍬 But let's begin with some eye candy.
Take a look at this mind-blowing 2-minute video and, if you like it, then read on, I'll tell you a couple of things about it...
Generative Adversarial Networks (GAN) have taken by surprise the machine learning world with their uncanny ability to generate hyper-realistic examples of human faces, cars, landscapes, and a lot of other stuff, as you just saw.
A silly excuse I just invented to share with you random bits of theory from some dark corner of Computer Science and make it as beginner-friendly as possible 👇
Today I want to talk about *Algorithmic Complexity *.
To get started, take a look at the following code. How long do you think it will take to run it?
Let's make that question more precise. How long do you think it will take to run it in the worst-case scenario?
We can see that the code will run slower if:
👉 your computer is older;
👉 the array is longer; or
👉 x happens to be further to back, or not present at all.
Can we turn these insights into an actual formula? We will have to get rid of ambiguous stuff like "old computers".