Yuhuai (Tony) Wu Profile picture
May 26 13 tweets 5 min read
After showing a few examples, large language models can translate natural language mathematical statements into formal specifications.

We autoformalize 4K theorems as new data to train our neural theorem prover, achieving SOTA on miniF2F!

1/

Paper: arxiv.org/abs/2205.12615
We show two randomly chosen few-shot examples in the prompt, from latex to formal math (Isabelle). Note that these two examples are merely examples of syntactical translations, without much sophistication in reasoning or natural language understanding.

2/
Now, let the examples do the talking!
See the figure attached – Codex perfectly formalizes an IMO problem! It handles the negation “there is no function” by proof-by-contradiction. It understands the phrase “into itself” and correctly formalizes the co-domain of f.
The next figure shows a perfect translation of a grade school math problem by PaLM. This is remarkable because such a statement is completely out-of-distribution – no formal mathematicians are interested in formalizing grade school math problems ;)

4/
Let’s also look at a failure case as well – in this case Codex fails to formalize the concept of “linear function”. It made up a name: linear f.

5/
Can the model learn to formalize such problems if the prompt contains an example that explains the concept? We find if we add a tangentially related problem, then the model can formalize the “linear function” perfectly!

6/
We use Codex to formalize 3908 MATH problems. We then run expert iteration on these autoformalized statements. This allows us to achieve a new state of the art on the miniF2F theorem proving benchmark.

This is the first proof-of-concept of practical autoformalization!

7/
We further explore if the model can handle more advanced mathematics beyond competition problems. We find these models are surprisingly good at turning formal statements into natural language as well!

8/
We see the model makes a jump in reasoning. Going from the definition, "for all x, if x in A -> x in B", to a more concise and abstract phrase "A is a subset of B". Also the same for "finite intersections" and "arbitrary unions". See examples in the figures!

9/
Why is this surprising? People know large language models can turn natural language descriptions into code. However, the existing known successes are limited to commonly used programming languages (e.g., Python). Formalizing mathematics is different for at least two reasons.

10/
1. The formal math data is very scarce. The whole Isablle proof script is only about 180MB. 2. There is almost zero aligned data between natural language and formal mathematics, whereas docstrings for language like Python are broadly available.
Our finding hence shows a very surprising capability of these models. They learned very general and transferable knowledge that allows them to work with low-resource formal language.

12/
Lastly, big thanks to my collaborators: @AlbertQJiang, @WendaLi8, @MarkusNRabe, Charles Staats, @Mateja_Jamnik, @ChrSzegedy!!!

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Yuhuai (Tony) Wu

Yuhuai (Tony) Wu Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @Yuhu_ai_

Mar 29
Language models can dramatically improve their reasoning by learning from chains of thought that they generate.

With STaR, just a few worked examples can boost accuracy to that of a 30X larger model (GPT-J to GPT-3).

arxiv.org/abs/2203.14465

W. @ericzelikman, Noah Goodman

1/
@ericzelikman Human reasoning is often the result of extended chains of thought.

We want to train a model that can generate explicit rationales before answering a question.

The main challenge: most of the datasets only contain a question answer pair, but not the intermediate rationales.
One solution is to use human labels [Rajani et al. arxiv.org/abs/1906.02361]. But this is costly and hence not scalable. In addition, the model cannot improve beyond human labels.

3/
Read 11 tweets
Jul 9, 2020
Can Neural Networks solve IQ tests? We propose Scattering Compositional Learner (SCL) for RPM Task. SCL improves SOTA from 63.9% to 95.0%. It is even capable of zero-shot generalization and learns disentangled representations!

paper: arxiv.org/abs/2007.04212

(1/n)
SCL is designed to discover the compositional structures of the data. In RAVEN, It learns to discover the compositions of objects, attributes, and relationships. The figure shows an example where SCL learns the concept of “size”.

(2/n)
By learning compositional structures, it can even generalize to unseen analogies. E.g., After learning (“color”, “constant”), and (“size”, “progression”), the model can generalize to (“color”, “progression”).

(3/n)
Read 5 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(