Account Share

 

Thread by @goodfellow_ian: "A quick thread on two of my favorite theory hacks for machine learning research A lot of the time, we want to analyze the optimal behavior o […]"

10 tweets
A quick thread on two of my favorite theory hacks for machine learning research
A lot of the time, we want to analyze the optimal behavior of a neural net using algebra / calculus. Neural net models are usually too complicated for you to algebraically solve for the parameters that optimize most functions (unless it's some trivial function like weight decay)
To get a less complicated model, a common instinct is to use a linear model. This is nice because it makes a lot of optimization problems convex. But it has a downside: a linear model can't do a lot of things a neural net can do. The solution becomes very oversimplified.
Theory Hack #1: Model the neural net as an arbitrary function (so you optimize over the space of all functions f, rather than parameters theta for a particular neural net architecture). This is very clean compared to working with parameters and specific architectures.
The neural-net-as-function metaphor retains the main advantage of the linear model: many interesting problems are convex! For example, cross-entropy loss for a classifier is convex in function space.
This is also not too inaccurate of an assumption, especially compared to the linear model assumption. The universal approximator theorem says that neural nets can approximate arbitrary functions arbitrarily well.
Theory Hack #2: If you're having trouble thinking about optimizing in the space of all functions, imagine that a function is just a vector with very many entries. Instead of a function evaluation f(x) with x in R^n, imagine a vector lookup f_x where x is an integer index.
With Theory Hack #2, now optimizing over functions is just a regular calculus problem. Hack #2 is intuitive but not 100% accurate. For a more formal version and some information on restrictions about when you can use it, see deeplearningbook.org/contents/infer… sec 19.4.2
My co-authors and I used both theory hack #1 and #2 to derive eq 2 of the GAN paper: papers.nips.cc/paper/5423-gen…
Bonus: A great source of related theory hacks is Sec 3.2 of web.stanford.edu/~boyd/cvxbook/… . You can use these properties to prove that functions of functions are convex by construction using composition rules. No need to prove that second functional derivatives are positive, etc.
Missing some Tweet in this thread?
You can try to force a refresh.
This content can be removed from Twitter at anytime, get a PDF archive by mail!
This is a Premium feature, you will be asked to pay $30.00/year
for a one year Premium membership with unlimited archiving.
Don't miss anything from @goodfellow_ian,
subscribe and get alerts when a new unroll is available!
This is a Premium feature, you will be asked to pay $30.00/year
for a one year Premium membership with unlimited subscriptions/alert.
Did Thread Reader help you today?
Support me: I'm a solo developer! Read more about the story
Become a 💎 Premium member ($30.00/year) and get exclusive features!
Too expensive?
Make a small donation instead. Buy me a coffee ($5) or help for the server cost ($10):
Donate with 😘 Paypal or  Become a Patron 😍 on Patreon.com
Using crypto? You can help too!
Trending hashtags
Did Thread Reader help you today?
Support me: I'm a solo developer! Read more about the story
Become a 💎 Premium member ($30.00/year) and get exclusive features!
Too expensive?
Make a small donation instead. Buy me a coffee ($5) or help for the server cost ($10):
Donate with 😘 Paypal or  Become a Patron 😍 on Patreon.com
Using crypto? You can help too!