Tweet

Yuhuai (Tony) Wu

May 26 • 13 tweets • 5 min read

After showing a few examples, large language models can translate natural language mathematical statements into formal specifications.

We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F!

1/

Paper: arxiv.org/abs/2205.12615

We show two randomly chosen few-shot examples in the prompt, from latex to formal math (Isabelle). Note that these two examples are merely examples of syntactical translations, without much sophistication in reasoning or natural language understanding.

2/

Now, let the examples do the talking!
See the figure attached – Codex perfectly formalizes an IMO problem! It handles the negation “there is no function” by proof-by-contradiction. It understands the phrase “into itself” and correctly formalizes the co-domain of f.

The next figure shows a perfect translation of a grade school math problem by PaLM. This is remarkable because such a statement is completely out-of-distribution – no formal mathematicians are interested in formalizing grade school math problems ;)

4/

Let’s also look at a failure case as well – in this case Codex fails to formalize the concept of “linear function”. It made up a name: linear f.

5/

Can the model learn to formalize such problems if the prompt contains an example that explains the concept? We find if we add a tangentially related problem, then the model can formalize the “linear function” perfectly!

6/

We use Codex to formalize 3908 MATH problems. We then run expert iteration on these autoformalized statements. This allows us to achieve a new state of the art on the miniF2F theorem proving benchmark.

This is the first proof-of-concept of practical autoformalization!

7/

We further explore if the model can handle more advanced mathematics beyond competition problems. We find these models are surprisingly good at turning formal statements into natural language as well!

8/

We see the model makes a jump in reasoning. Going from the definition, "for all x, if x in A -> x in B", to a more concise and abstract phrase "A is a subset of B". Also the same for "finite intersections" and "arbitrary unions". See examples in the figures!

9/

Why is this surprising? People know large language models can turn natural language descriptions into code. However, the existing known successes are limited to commonly used programming languages (e.g., Python). Formalizing mathematics is different for at least two reasons.

10/

1. The formal math data is very scarce. The whole Isablle proof script is only about 180MB. 2. There is almost zero aligned data between natural language and formal mathematics, whereas docstrings for language like Python are broadly available.

Our finding hence shows a very surprising capability of these models. They learned very general and transferable knowledge that allows them to work with low-resource formal language.

12/

@AlbertQJiang

Lastly, big thanks to my collaborators: @AlbertQJiang, @WendaLi8, @MarkusNRabe, Charles Staats, @Mateja_Jamnik, @ChrSzegedy!!!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @Yuhu_ai_

Yuhuai (Tony) Wu

@Yuhu_ai_

Mar 29

@ericzelikman

Language models can dramatically improve their reasoning by learning from chains of thought that they generate.

With STaR, just a few worked examples can boost accuracy to that of a 30X larger model (GPT-J to GPT-3).

arxiv.org/abs/2203.14465

W. @ericzelikman, Noah Goodman

1/

@ericzelikman

@ericzelikman Human reasoning is often the result of extended chains of thought.

We want to train a model that can generate explicit rationales before answering a question.

The main challenge: most of the datasets only contain a question answer pair, but not the intermediate rationales.

One solution is to use human labels [Rajani et al. arxiv.org/abs/1906.02361]. But this is costly and hence not scalable. In addition, the model cannot improve beyond human labels.

3/

Read 11 tweets

Yuhuai (Tony) Wu

@Yuhu_ai_

Jul 9, 2020

Can Neural Networks solve IQ tests? We propose Scattering Compositional Learner (SCL) for RPM Task. SCL improves SOTA from 63.9% to 95.0%. It is even capable of zero-shot generalization and learns disentangled representations!

paper: arxiv.org/abs/2007.04212

(1/n)

SCL is designed to discover the compositional structures of the data. In RAVEN, It learns to discover the compositions of objects, attributes, and relationships. The figure shows an example where SCL learns the concept of “size”.

(2/n)

By learning compositional structures, it can even generalize to unseen analogies. E.g., After learning (“color”, “constant”), and (“size”, “progression”), the model can generalize to (“color”, “progression”).

(3/n)

Read 5 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Yuhuai (Tony) Wu

People who liked this thread also liked...

Try unrolling a thread yourself!

More from @Yuhu_ai_

Yuhuai (Tony) Wu

Yuhuai (Tony) Wu

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?