Read on Twitter

12,399 views

@janislavjankov

, 9 tweets, 3 min read Read on Twitter

@hardmaru

@hardmaru

My notes on "Weight Agnostic Neural Networks" by Adam Gaier and David Ha
arxiv.org/abs/1906.04358
This paper was such a breeze to read! As expected by @hardmaru.

We know the network’s architecture plays significant role in its ability to solve a problem. But how much?1/9

Some animals possess crucial skills right from their birth, the connectome is thought to play significant role. There is also evidence that networks with specific architecture can help solve problems without weight training, e.g. CNN, LSTM. I’d also point to Reservoir Computing.2

To fully focus on the architecture in this work 1.the network’s weights are reduced to a single shared value; and 2.the performance of the network is measured by averaging over multiple runs each with a different randomly sampled weight. RL & classification tasks are considered.3

Evolutionary algorithm (EA) is employed to search for architecture encoding solution for the problem. The EA loop goes as usual: mutate, evaluate, and rank to form the new generation. The mutations are randomly adding a connection or a node or changing an activation.4/9

A network is evaluated based on its performance (average and max for a single weight) *and* its simplicity. The latter criteria is motivated by Kolmogorov complexity and minimal description length - networks with less connections (shorter programs) are preferred.5/9

Here’s something cool. When ranking the networks the authors don’t rely on a function that mixes all of the fitness criteria and spits out a number, but instead rank based on dominance relations. This way there is no need to shape new fitness function for each new task.6/9

Compared against SOTA architectures for different RL tasks, the proposed method finds topologies that perform even with random shared weight. With tuned shared weight they get close to the best results. And tuning a single weight is as easy as sweeping through a range of values.7

Important observation is that when all random not shared weights are assigned to the found architectures, the networks fail at the task. As the architecture captures relationships between the inputs, the sign of the weights, in particular, should be kept consistent.8/9

@hardmaru

@hardmaru

I’m curious whether this architecture search prefers certain activations over others. And although this is not something we’ll jump to using in practice, curious to see comparison of the sample efficiency with other architecture search approaches. cc @hardmaru 9/9

Like this thread? Get email updates or save it to PDF!

Subscribe to Janislav Jankov

Get real-time email alerts when new unrolls are available from this author!

This content may be removed anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Like this thread? Get email updates or save it to PDF!

Subscribe to Janislav Jankov

This content may be removed anytime!

Try unrolling a thread yourself!

Related threads

Trending hashtags

Did Thread Reader help you today?