Finetuning language models on instructions increasingly seems a compute-efficient way to gain performance.

Recent work from @hwchung27, @_jasonwei, @JeffDean, @quocleix & others scales this up to new regimes.

TLDR: Even for big models (540B params), gains are substantial.

1/12 Image
For those who prefer a narrated version:



2/12
Flan-PaLM 540B (PaLM 540B finetuned on instructions) makes major progress on MMLU.

Note: my previous graph () lacked some of the available SotA forecasts - that's updated below.

Even with the update, the numbers remain impressive.

3/12 Image
The forecasts themselves are interesting.

Credit to @JacobSteinhardt for leading these.

I recommend his analysis 1 year into the forecast bounded-regret.ghost.io/ai-forecasting…

TLDR: forecasters didn't do well. But to some degree, this is because progress was "surprising".

4/12 Image
Hypermind SotA forecasters revised their estimates upwards earlier this summer. But not massively.

@jacoabsteinhardt flagged in July '22 that Hypermind forecasts seemed low.

Metacalculus forecasts & Steinhardt's are higher.

ML researchers: contribute to future forecasts!

5/12 Image
@Hwchung et al. explore a regime that uses a lot of instructions ~1800 tasks in total.

Models from small (80 million params) to v. big (540 billion params) are studied.

Interestingly, finetuning is relatively cheap (at most 1.6% of compute relative to pretraining).

6/12 Image
Increasing model size continues to yield major gains.

Increasing the number of tasks helps, but brings diminishing returns.

7/12 Image
To perform well in both chain-of-thought and non-chain-of-thought prompting paradigms, both kinds of data should be included in the finetuning mixture.

8/12 Image
Flan finetuning (which includes chain-of-thought data) enables Flan-PaLM to benefit from chain-of-thought prompting in a zero-shot setting.

9/12 Image
It's useful to note that human preferences about open-ended model outputs may not correlate with NLP benchmark scores.

Still, human annotators prefer Flan-PaLM 540B to PaLM 540B by a healthy margin.

10/12 Image
If cats driving cars are your thing, Flan-PaLM can write funny poems.

11/12 Image
Overall takeaway: instruction finetuning seems likely to be broadly applicable for pretrained language models.

Paper: arxiv.org/abs/2210.11416

12/12 Image

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Samuel Albanie

Samuel Albanie Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @SamuelAlbanie

Oct 28
How can we reduce the computational cost of training neural networks?

Bo Zhao, Hakan Bilen and collaborators have produced a creative body of work developing a technique known as "dataset condensation".

1/7
Key idea: compress a large dataset into a small set of synthetic images that can train networks to the same accuracy as the original dataset.

Was a pleasure to examine Bo's thesis on this topic work with @driainmurray.

2/7
Papers/code links for Dataset Condensation:
- Gradient Matching (ICLR '21) arxiv.org/abs/2006.05929
- DSA (ICML '21) arxiv.org/abs/2102.08259
- CAFE (CVPR '22) arxiv.org/abs/2203.01531
- Distribution Matching (WACV '23) arxiv.org/abs/2110.04181

Code: github.com/VICO-UoE/Datas…

3/7
Read 6 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(