Latest Twitter Threads by @Yuhu_ai_ on Thread Reader App

Jun 22, 2022 • 14 tweets • 6 min read

Excited to share this new work, which sheds light on the understanding of pre-training via synthetic tasks.

We did three experiments that iteratively simplify pre-training while still retaining gains.

Paper: arxiv.org/abs/2206.10139

W. Felix Li, @percyliang.

1/

Nowadays, pre-training is ubiquitous in language, vision, audio, speech, RL etc.

But, we have little understanding on why it works so well.

One promising route is to pre-train on synthetic data, which makes it easier to understand and control.

2/

May 26, 2022 • 13 tweets • 5 min read

After showing a few examples, large language models can translate natural language mathematical statements into formal specifications.

We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F!

1/

Paper: arxiv.org/abs/2205.12615

We show two randomly chosen few-shot examples in the prompt, from latex to formal math (Isabelle). Note that these two examples are merely examples of syntactical translations, without much sophistication in reasoning or natural language understanding.

2/

Mar 29, 2022 • 11 tweets • 5 min read

Language models can dramatically improve their reasoning by learning from chains of thought that they generate.

With STaR, just a few worked examples can boost accuracy to that of a 30X larger model (GPT-J to GPT-3).

arxiv.org/abs/2203.14465

W. @ericzelikman, Noah Goodman

1/

@ericzelikman Human reasoning is often the result of extended chains of thought.

We want to train a model that can generate explicit rationales before answering a question.

The main challenge: most of the datasets only contain a question answer pair, but not the intermediate rationales.

Jul 9, 2020 • 5 tweets • 2 min read

Can Neural Networks solve IQ tests? We propose Scattering Compositional Learner (SCL) for RPM Task. SCL improves SOTA from 63.9% to 95.0%. It is even capable of zero-shot generalization and learns disentangled representations!

paper: arxiv.org/abs/2007.04212

(1/n)

SCL is designed to discover the compositional structures of the data. In RAVEN, It learns to discover the compositions of objects, attributes, and relationships. The figure shows an example where SCL learns the concept of “size”.

(2/n)

Share this page!

Enter URL or ID to Unroll