Samuel Albanie Profile picture
Researcher @GoogleDeepMind
May 15, 2023 25 tweets 18 min read
Another week, another full bucket of AI news.

Some highlights...

🧵1/25 Image Language models can explain neurons in language models

- Aims to scale up interpretability to large language models

- Exploits ability of GPT-4 to simulate neurons

by S. Bills, @nickcammarata, @mildseasoning, @HenkTillman, @nabla_theta, @WuTheFWasThat, @janleike

2/25 Image
Mar 31, 2023 5 tweets 3 min read
1/ 🚀🔬 Introducing our groundbreaking research paper: "Large Language Models are Few-shot Publication Scoopers"

We've discovered the secret to achieving personal glory and a lifetime supply of Cheerios
Joint work with
@LiliMomeni and J. F. Henriques

Appears @sigbovik today 2/ 🏃💨 Tired of racing to publish your next high-impact research?

Our revolutionary pip-to-the-post algo. ensures adulatory Wikipedia pages without risking your career on conventional research strategies

Scoop with the insouciance of a seasoned researcher at a dessert buffet🍨
Jan 24, 2023 21 tweets 8 min read
BLOOM.

A large language model trained by researchers from around the world by @BigscienceW.

How did they do it?

Why did they do it?

Let's dive in.

1/21
🧵 Large Languages Models (LLMs) now play a key role in NLP.

But few orgs can afford to train them.

Also:
- most LLMs focus on English
- many are not public

Goals for BLOOM
- release a strong multilingual LLM
- document the development process

2/21
Nov 7, 2022 17 tweets 10 min read
Multitask prompted finetuning (aka instruction finetuning) can boost language model performance.

But how can we make progress beyond English (esp. on languages with limited finetuning data)?

Work by @Muennighoff & others in @BigscienceW studies this in detail.

1/17 🧵 Image For this study, datasets spanning 46 languages were gathered (collectively referred to as "xP3").

xP3 aims to mimic the distribution of languages found in ROOTS (the dataset used to pretrain BLOOM).

2/17 Image
Oct 28, 2022 12 tweets 6 min read
Finetuning language models on instructions increasingly seems a compute-efficient way to gain performance.

Recent work from @hwchung27, @_jasonwei, @JeffDean, @quocleix & others scales this up to new regimes.

TLDR: Even for big models (540B params), gains are substantial.

1/12 Image For those who prefer a narrated version:



2/12
Oct 28, 2022 6 tweets 5 min read
How can we reduce the computational cost of training neural networks?

Bo Zhao, Hakan Bilen and collaborators have produced a creative body of work developing a technique known as "dataset condensation".

1/7 Key idea: compress a large dataset into a small set of synthetic images that can train networks to the same accuracy as the original dataset.

Was a pleasure to examine Bo's thesis on this topic work with @driainmurray.

2/7