Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Samuel Albanie 🇬🇧

@SamuelAlbanie

Oct 28, 2022 • 12 tweets • 6 min read • Read on X

Scrolly

@hwchung27

Finetuning language models on instructions increasingly seems a compute-efficient way to gain performance.

Recent work from @hwchung27, @_jasonwei, @JeffDean, @quocleix & others scales this up to new regimes.

TLDR: Even for big models (540B params), gains are substantial.

1/12

For those who prefer a narrated version:

2/12

https://twitter.com/SamuelAlbanie/status/1584257440891416576

Flan-PaLM 540B (PaLM 540B finetuned on instructions) makes major progress on MMLU.

Note: my previous graph (

https://twitter.com/SamuelAlbanie/status/1584257440891416576

) lacked some of the available SotA forecasts - that's updated below.

Even with the update, the numbers remain impressive.

3/12

@JacobSteinhardt

The forecasts themselves are interesting.

Credit to @JacobSteinhardt for leading these.

I recommend his analysis 1 year into the forecast bounded-regret.ghost.io/ai-forecasting…

TLDR: forecasters didn't do well. But to some degree, this is because progress was "surprising".

4/12

Hypermind SotA forecasters revised their estimates upwards earlier this summer. But not massively.

@jacoabsteinhardt flagged in July '22 that Hypermind forecasts seemed low.

Metacalculus forecasts & Steinhardt's are higher.

ML researchers: contribute to future forecasts!

5/12

@Hwchung

@Hwchung et al. explore a regime that uses a lot of instructions ~1800 tasks in total.

Models from small (80 million params) to v. big (540 billion params) are studied.

Interestingly, finetuning is relatively cheap (at most 1.6% of compute relative to pretraining).

6/12

Increasing model size continues to yield major gains.

Increasing the number of tasks helps, but brings diminishing returns.

7/12

To perform well in both chain-of-thought and non-chain-of-thought prompting paradigms, both kinds of data should be included in the finetuning mixture.

8/12

Flan finetuning (which includes chain-of-thought data) enables Flan-PaLM to benefit from chain-of-thought prompting in a zero-shot setting.

9/12

It's useful to note that human preferences about open-ended model outputs may not correlate with NLP benchmark scores.

Still, human annotators prefer Flan-PaLM 540B to PaLM 540B by a healthy margin.

10/12

If cats driving cars are your thing, Flan-PaLM can write funny poems.

11/12

Overall takeaway: instruction finetuning seems likely to be broadly applicable for pretrained language models.

Paper: arxiv.org/abs/2210.11416

12/12

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @SamuelAlbanie

Samuel Albanie 🇬🇧

@SamuelAlbanie

Jun 8

Who are the top public human LLM prompters rn? 🧵

A few of my picks below

(I'm biased obvs, and a lot of talent is prompting in private)

1/9

Murray Shanahan @mpshanahan

consciousness, AI & philosophy

Goes deep

2/9doc.ic.ac.uk/~mpsha/convers…

Janus @repligate

hard to compress in a tweet

unearthing new phenomena through ninja prompting

and taking LLMs seriously

3/9

Read 9 tweets

Samuel Albanie 🇬🇧

@SamuelAlbanie

May 15, 2023

Another week, another full bucket of AI news.

Some highlights...

🧵1/25

@nickcammarata

Language models can explain neurons in language models

- Aims to scale up interpretability to large language models

- Exploits ability of GPT-4 to simulate neurons

by S. Bills, @nickcammarata, @mildseasoning, @HenkTillman, @nabla_theta, @WuTheFWasThat, @janleike

2/25

@lm_zheng

Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings

- Creates anonymous, randomized battles between chatbots

- For now, the top 3 models are:
(1) GPT-4
(2) Claude-v1
(3) GPT-3.5-turbo

by @lm_zheng, @ying11231, @infwinston, @haozhangml, @profjoeyg ++

3/25

Read 25 tweets

Samuel Albanie 🇬🇧

@SamuelAlbanie

Mar 31, 2023

@LiliMomeni

1/ 🚀🔬 Introducing our groundbreaking research paper: "Large Language Models are Few-shot Publication Scoopers"

We've discovered the secret to achieving personal glory and a lifetime supply of Cheerios
Joint work with
@LiliMomeni and J. F. Henriques

Appears @sigbovik today

2/ 🏃💨 Tired of racing to publish your next high-impact research?

Our revolutionary pip-to-the-post algo. ensures adulatory Wikipedia pages without risking your career on conventional research strategies

Scoop with the insouciance of a seasoned researcher at a dessert buffet🍨

3/ 🎭⚔️ Our paper draws inspiration from the glory days of scientific feuds like the 16th-century Prioritätsstreit between Tycho and Ursus

Remember, the best science is science that brings YOU personal glory!

Read 5 tweets

Samuel Albanie 🇬🇧

@SamuelAlbanie

Jan 24, 2023

@BigscienceW

BLOOM.

A large language model trained by researchers from around the world by @BigscienceW.

How did they do it?

Why did they do it?

Let's dive in.

1/21
🧵

Large Languages Models (LLMs) now play a key role in NLP.

But few orgs can afford to train them.

Also:
- most LLMs focus on English
- many are not public

Goals for BLOOM
- release a strong multilingual LLM
- document the development process

2/21

BLOOM was a BigScience effort:

- 28 countries
- 1200+ registered participants

3/21

Read 21 tweets

Samuel Albanie 🇬🇧

@SamuelAlbanie

Nov 7, 2022

@Muennighoff

Multitask prompted finetuning (aka instruction finetuning) can boost language model performance.

But how can we make progress beyond English (esp. on languages with limited finetuning data)?

Work by @Muennighoff & others in @BigscienceW studies this in detail.

1/17 🧵

For this study, datasets spanning 46 languages were gathered (collectively referred to as "xP3").

xP3 aims to mimic the distribution of languages found in ROOTS (the dataset used to pretrain BLOOM).

2/17

Three dataset variants were studied:
- English prompts on English datasets (P3)
- English prompts on multilingual datasets (xP3)
- Machine-translated prompts on multilingual datasets (xP3mt)

3/17

Read 17 tweets

Samuel Albanie 🇬🇧

@SamuelAlbanie

Oct 28, 2022

How can we reduce the computational cost of training neural networks?

Bo Zhao, Hakan Bilen and collaborators have produced a creative body of work developing a technique known as "dataset condensation".

1/7

@driainmurray

Key idea: compress a large dataset into a small set of synthetic images that can train networks to the same accuracy as the original dataset.

Was a pleasure to examine Bo's thesis on this topic work with @driainmurray.

2/7

Papers/code links for Dataset Condensation:
- Gradient Matching (ICLR '21) arxiv.org/abs/2006.05929
- DSA (ICML '21) arxiv.org/abs/2102.08259
- CAFE (CVPR '22) arxiv.org/abs/2203.01531
- Distribution Matching (WACV '23) arxiv.org/abs/2110.04181

Code: github.com/VICO-UoE/Datas…

3/7

Read 6 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Samuel Albanie 🇬🇧

Try unrolling a thread yourself!

More from @SamuelAlbanie

Samuel Albanie 🇬🇧

Samuel Albanie 🇬🇧

Samuel Albanie 🇬🇧

Samuel Albanie 🇬🇧

Samuel Albanie 🇬🇧

Samuel Albanie 🇬🇧

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!