"Language Models Can Teach Themselves to Program Better"
This paper changed my thinking about what future langauge models will be good at, mostly in a really concerning way. Let's start with some context: [1/11]
To teach models to program, you used to give them a natural language prompt. But recent work has shown that you can instead just show them a unit test and tell them to… [2/11]
…generate a program that satisfies it (a “programming puzzle”). This is way nicer because it’s simpler and you can just run the code to see if it works. [3/11]
What this paper proposes to do is use a language model to generate such puzzles, and then use the puzzles to further train the model. You can check whether the puzzles are valid programmatically, so this gives you a scalable way to generate training data. [4/11]
This approach works really well. They find that using the generated (puzzle, solution) pairs helps a lot, as does only using pairs where the solution is correct. [5/11]
This is strong research with a clever method and cool results, but the implications worry me. The Chinchilla paper (arxiv.org/abs/2203.15556) shows that training data is currently the limiting factor in… [6/11]
…scaling up language models. But if we can generate unlimited training data for coding in particular, this suggests that *future language models will be far better at coding than anything else.* [7/11]
This might be bad because coding ability is the main thing that could make deployed models hard to control—like, no amount of English proficiency lets you modify… [8/11]
…your own source code or break out of a Docker container. But if you can code at a superhuman level? Things could get interesting... [9/11]
Also, I know some people think teaching AI to code is great, while others think that misaligned AI is the #1 threat to humanity (e.g., @ESYudkowsky and company), so hopefully this paper spurs some good discussion. [10/11]
"An Impartial Take to the CNN vs Transformer Robustness Contest"
Are vision transformers really better than CNNs? This paper strongly suggests an answer, based on a robustness throwdown between {ViT, Swin} vs {BiT, ConvNeXt}. [1/10]
First, they measure the learning of spurious features using datasets designed to assess simplicity bias, background bias, and texture bias. The transformers and the CNNs behave similarly. [2/10]
For OOD detection, transforms and CNNs again work equally well. [3/10]
An intuitive method for making models robust to distribution shift. They replace vectors in the latent space with their nearest centroids, with the clustering… [1/8]
…and quantization applied separately to different slices of the feature space. The centroids are learned using a moving average process similar to minibatch k-means. [2/8]
The intuition here is that, when adapting to different input distributions, only certain combinations of codes will come up, so the codes corresponding to other input distributions will be unaffected. [3/8]
"Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks"
A good survey on methods for probing what’s going on in models. I like… [1/7]
…both the quality of the figures and the fact that they create an informative taxonomy, rather than just dumping a semi-organized list of papers. [2/7]
Here’s how they break down methods for understanding individual neurons: [3/7]
"Plex: Towards Reliability using Pretrained Large Model Extensions"
A super thorough paper that addresses an important problem, introduces tasks and datasets to facilitate future work, and gets great results. [1/10]
The problem they’re going after is model “reliability”, roughly formulated as a mix of robustness to different distributions and uncertainty estimation. [2/10]
To improve reliability, they introduce Pretrained Large model EXtentions (PLEX). These consist of a few interventions on top of a vanilla model. [3/10]
"Language models show human-like content effects on reasoning"
So the specific results are interesting and actionable, but my main takeaway from this paper is that cognitive science is becoming adjacent to AI in a way it hasn't been since the 50s or 60s. [1/6]
What they show is that big language models show biases in logical reasoning similar to those of humans. In particular, these models are better at reasoning about concrete situations where the conclusions match reality. [2/6]
They also tend to believe statements that contain nonsense or match reality, even if the logic behind the statement is invalid. [3/6]
"Understanding Dataset Difficulty with V-Usable Information"
This is one of those rare papers that gave me clearer thinking about the fundamentals of machine learning by giving crisp definitions for concepts I’d only thought about vaguely. [1/11]
As background, you can define the amount of V-information a dataset has as the output entropy given a fixed, useless input minus the output entropy given the true inputs. [2/11]
They propose to extend V-information to apply to individual samples. Their “pointwise V-information” is the increase in output likelihood given the actual input vs given a fixed, useless input. [3/11]