1/
Nowadays, pre-training is ubiquitous in language, vision, audio, speech, RL etc.
But, we have little understanding on why it works so well.
One promising route is to pre-train on synthetic data, which makes it easier to understand and control.
2/
May 26, 2022 • 13 tweets • 5 min read
After showing a few examples, large language models can translate natural language mathematical statements into formal specifications.
We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F!
1/
Paper: arxiv.org/abs/2205.12615
We show two randomly chosen few-shot examples in the prompt, from latex to formal math (Isabelle). Note that these two examples are merely examples of syntactical translations, without much sophistication in reasoning or natural language understanding.
2/
Mar 29, 2022 • 11 tweets • 5 min read
Language models can dramatically improve their reasoning by learning from chains of thought that they generate.
With STaR, just a few worked examples can boost accuracy to that of a 30X larger model (GPT-J to GPT-3).
1/ @ericzelikman Human reasoning is often the result of extended chains of thought.
We want to train a model that can generate explicit rationales before answering a question.
The main challenge: most of the datasets only contain a question answer pair, but not the intermediate rationales.
Jul 9, 2020 • 5 tweets • 2 min read
Can Neural Networks solve IQ tests? We propose Scattering Compositional Learner (SCL) for RPM Task. SCL improves SOTA from 63.9% to 95.0%. It is even capable of zero-shot generalization and learns disentangled representations!
(1/n)
SCL is designed to discover the compositional structures of the data. In RAVEN, It learns to discover the compositions of objects, attributes, and relationships. The figure shows an example where SCL learns the concept of “size”.