1/ I'm very happy to give a little thread today on our paper accepted at ICLR 2021!
🎉🎉🎉
In this paper, we show how to build ANNs that respect Dale's law and which can still be trained well with gradient descent. I will expand in this thread...
Practically speaking, this implies that neurons are either all excitatory or inhibitory. It's not 100%, nothing is in biology, but it's roughly true.
3/ You may have wondered, "Why don't more people use ANNs that respect Dale's law?"
The rarely discussed reason is this:
When you try to train an ANN that respects Dale's law with gradient descent, it usually doesn't work as well -- worse than an ANN that ignores Dale's law.
4/ In this paper, we try to rectify this problem by developing corrections to standard gradient descent to make it work with Dale's law.
Our hope is that this will allow people in the Neuro-AI field to use ANNs that respect Dale's law more often, ideally as a default.
5/ To begin with, we assume we are dealing with feed-forward inhibitory interneurons, so that inhibitory units receive inputs from the layer below and project within layer. Note though, that all of our results can generalise to recurrent inhibition using a rolled out RNN.
6/ To make gradient descent work well in these networks (which we call "Dale's ANNs", or "DANNs") we identify two important strategies.
7/ First, we develop methods for initialising the network so that it starts in a regime that is roughly equivalent to layer norm.
8/ Second, inspired by natural gradients, we use the Fisher Information matrix to show that the interneurons have a disproportionate effect on the output distribution, which requires a correction term to their weight-updates to prevent instability.
9/ When we implement these changes, we can match a standard ANN that doesn't obey Dale's law!
Not so for ANNs that obey Dale's law and doesn't use these tactics!
(Here are MNIST results, ColumnEI is a simple ANN with columns in the weight matrices forced to be all + or -)
10/ This can work in RNNs (as noted above) and also in convnets, as shown here:
11/ There are many next steps! But, an immediate one will be developing PyTorch and TensorFlow modules with these corrections built-in to make it easy for other researchers to build and train DANNs.
12/ This work also raises interesting Qs about inhibitory plasticity. It can sometimes be tough to induce synaptic plasticity in interneurons. Is that because the brain has it's own built-in mechanisms that correct for an outsized impact of interneurons on output behaviour?
13/ Another Q: why does the brain use Dale's law? We were only able to *match* standard ANNs. No one has shown *better* results with Dale's law. So, is Dale's law just an evolutionary constraint, a local minima in the phylogenetic landscape? Or does it help in some other way?
14/ Recognition where due: this work was lead by Jonathan Cornford (left), with important support from Damjan Kalajdzievski (right) and several others.
From now on, I will direct anyone who emails me to this form, which eliminates the "hidden curriculum" of how to write to a PI. Hopefully, this limits potential implicit bias from me.
3/ This application form also allows people to identify as being from an under-represented group, if they wish, which will help the lab to consider diversity more concretely when we're making recruitment decisions.
1/n) A small thread on argument structure... I've been thinking about this bc @TheFrontalLobe_ recently posted about Tim Van Gelder's famous paper, and I turned into a complete jerk in response (sorry again André). I was asking myself, why does that paper make me so irritable?
2/n) I realized why: it's the structure of the argument in the paper. I also realized that many other papers that get under my skin share the same structure. I need to learn to be less of a jerk on Twitter, yes, but I also want to highlight why this structure bothers me so.
3/n) Here's the structure I find so irritating:
1. Concept A is central to current theories of the brain/mind 2. Define A as implying X, Y, Z, claim concept B does not 3. Argue that X, Y, Z are surely not how brains/minds work 4. Conclude that we should abandon A in favour of B
1/ For #ShutDownSTEM today, our lab put aside research and crafted some concrete ideas for what we can do to help reduce anti-black/indig. racism, and more broadly, increase diversity in STEM. Our focus was local, specific acts to the lab. I wanted to share what we came up with.
2/ (Item 1) we decided that we could alter the way in which members of the lab are selected, in order to reduce the potential influence of unconscious biases and barriers that could potentially keep BIPOC from getting into the lab.
3/ Currently, the process is informal: ppl email me, and if I am impressed, I have a Zoom meeting and/or they give a talk and meet the lab. I then ask the lab's opinions, and make a final decision.
We think our results are quite exciting, so let's go!
2/ Here, we are concerned with the credit assignment problem. How can feedback from higher-order areas inform plasticity in lower-order areas in order to ensure efficient and effective learning?
3/ Based on the LTP/LTD literature (e.g. jneurosci.org/content/26/41/…), we propose a "burst-dependent synaptic plasticity" rule (BDSP). It says, if there is a presynaptic eligibility trace, then:
2/ I'm tempted to ignore it, but I think that would actually be a shame, because in many ways, it's a good article. Yet, it is also a confused article, and I worry about it confusing both scientists and the public more broadly. So, I'll just quickly address the confusion.
3/ The mistake is a classic mistake. @matthewcobb is not the first to make it, and I know he will not be the last. It's this: to think that Von Neumann machines (like our laptops) are the only type of computer, and that their properties define computation. That is false.
@GaryMarcus@r_chavarriaga@KordingLab@DeepMindAI You don't actually keep up with the neuroscience literature, do you? That has been evident in these conversations... Here, lemme give you a few examples:
@GaryMarcus@r_chavarriaga@KordingLab@DeepMindAI 1) ANNs optimised on relevant tasks match the representations in human (and primate) cortical areas better than other models developed to date: