Profile picture
Ian Goodfellow @goodfellow_ian
, 10 tweets, 2 min read Read on Twitter
A quick thread on two of my favorite theory hacks for machine learning research
A lot of the time, we want to analyze the optimal behavior of a neural net using algebra / calculus. Neural net models are usually too complicated for you to algebraically solve for the parameters that optimize most functions (unless it's some trivial function like weight decay)
To get a less complicated model, a common instinct is to use a linear model. This is nice because it makes a lot of optimization problems convex. But it has a downside: a linear model can't do a lot of things a neural net can do. The solution becomes very oversimplified.
Theory Hack #1: Model the neural net as an arbitrary function (so you optimize over the space of all functions f, rather than parameters theta for a particular neural net architecture). This is very clean compared to working with parameters and specific architectures.
The neural-net-as-function metaphor retains the main advantage of the linear model: many interesting problems are convex! For example, cross-entropy loss for a classifier is convex in function space.
This is also not too inaccurate of an assumption, especially compared to the linear model assumption. The universal approximator theorem says that neural nets can approximate arbitrary functions arbitrarily well.
Theory Hack #2: If you're having trouble thinking about optimizing in the space of all functions, imagine that a function is just a vector with very many entries. Instead of a function evaluation f(x) with x in R^n, imagine a vector lookup f_x where x is an integer index.
With Theory Hack #2, now optimizing over functions is just a regular calculus problem. Hack #2 is intuitive but not 100% accurate. For a more formal version and some information on restrictions about when you can use it, see deeplearningbook.org/contents/infer… sec 19.4.2
My co-authors and I used both theory hack #1 and #2 to derive eq 2 of the GAN paper: papers.nips.cc/paper/5423-gen…
Bonus: A great source of related theory hacks is Sec 3.2 of web.stanford.edu/~boyd/cvxbook/… . You can use these properties to prove that functions of functions are convex by construction using composition rules. No need to prove that second functional derivatives are positive, etc.
Missing some Tweet in this thread?
You can try to force a refresh.

Like this thread? Get email updates or save it to PDF!

Subscribe to Ian Goodfellow
Profile picture

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just three indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member and get exclusive features!

Premium member ($3.00/month or $30.00/year)

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!