GuilleFix — k/acc ⏩ Profile picture
explorer. ⏩ AI+BCI+XR+cognitive liberty! | currently solving embodied AI (dm me) | friendly transhumanist 💚 Art+Science+Ambition+Community
Dec 11, 2020 8 tweets 1 min read
today in my dream there was a cool cyberpunk/steampunk/furry world.
I wished I could have as much imagination when awake xD hmm probably training a GAN on everything I see/do on the interenet and then sampling it would be the closest thing to simulating my dreams
Dec 9, 2020 12 tweets 5 min read
I’m super excited to release this!

What do we want from a generalization theory of deep learning?

We propose 7 desiderata (Ds),

review how existing bounds do at them,

and show that a marginal-likelihood PAC-Bayes bound does better at most Ds

arxiv.org/abs/2012.04115 The desiderata are:

The predictions should scale correctly when changing
1 data complexity
2 training set size
3 architecture
4 optimizer

The theory should also be
5 non-vacuous
6 efficiently computable
7 rigorous
Sep 5, 2019 11 tweets 3 min read
it's well known that a symmetric matrix A can be decomposed as a sum of outer products of its (orthonormal) evecs: A = ∑λ u u^T,
I think that for any inner prod, u can write it like that using a basis {v} orthonormal in that inner prod, with different evals λ' 1/7 in general. If you have a basis {v}, and a set {u} s.t. u_i^T v_j = δ_ij, and Av_i = w_i, you can write A = ∑_i w_i u_i^T
you are just making it do the right thing on the {v}, and then on any other vector by linearity.
2/7
Sep 3, 2019 8 tweets 3 min read
These experiments suggest NTK generalizes better than NNGP except in trivial cases.
but hmm
*NNGP is also what you get if you train *all layers* in an exact Bayesian way using Gaussian weight prior.
*in arxiv.org/pdf/1810.05148… vanilla SGD-trained NN does no better than NNGP
1/7 It's interesting to see that when learning a function of a given complexity (monomial degree), there is an optimal depth. Too deep networks can generalize worse*, on top of being harder to train.
* but
Aug 31, 2019 7 tweets 2 min read
Interesting paper looking at linearized dynamics of SGD, and the infinite depth limit of NTK:

arxiv.org/abs/1905.13654

hattip to @aib1515 for sharing. 1/7 Previous work on information propagation through the hidden layers showed that the conjugate kernel converges to a constant one exponentially fast, unless the parameters are initialized at the edge of chaos (EOC). weight/bias variances leading to EOC depend on architecture. 2/7