Colin Raffel Profile picture
nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlp
May 12, 2022 9 tweets 5 min read
New preprint! We introduce 𝚃-𝙵𝚎𝚠 and (𝙸𝙰)³, a few-shot learning recipe that outperforms in-context learning at dramatically lower costs and gets super-human results on the RAFT benchmark for the first time.

📄 arxiv.org/abs/2205.05638
💾 github.com/r-three/t-few
🧵⬇️
(1/9) Image Few-shot in-context learning induces a language model to perform a task by feeding in a few labeled examples along with a task description and then generating a prediction for an unlabeled example. Processing these “in-context” examples can incur a huge computational cost. (2/9) Image
Feb 17, 2022 8 tweets 3 min read
When and why is it possible to extract training data from large language models?

In a new preprint, we show that the number of times a sequence is duplicated in the training data heavily impacts whether it can be successfully extracted.

arxiv.org/abs/2202.06539

Thread⬇️ (1/8) We study the attack from Carlini et. al. (arxiv.org/abs/2012.07805) that
1) generates samples from a trained model
2) identifies samples copied from the training data through various metrics

Both steps are highly sensitive to training data duplication! (2/8)
Dec 8, 2021 11 tweets 4 min read
Announcing a new research focus in my lab: Developing tools to enable collaboratively-built and continually-improved models.

Blog post: colinraffel.com/blog/a-call-to…
Paper on model "patches": arxiv.org/abs/2111.09839
Paper on "merging" models: arxiv.org/abs/2111.09832
Thread ⬇️ (1/11) Pre-trained models are a vital part of the model ML ecosystem. But large-scale pre-training is expensive, so most popular pre-trained models were developed by small isolated teams within large resource-rich companies. (2/11)
Sep 20, 2021 8 tweets 2 min read
Now that "Do Transformer Modifications Transfer Across Implementations and Applications?" has been accepted to #EMNLP2021, we can finally tweet about it!

Paper 📝: arxiv.org/abs/2102.11972
Code 💾: github.com/google-researc…
Thread summary: ⬇️ (1/8) After we published the T5 paper where we empirically surveyed many transfer learning methods to find out what works best, we decided to do something similar for Transformer architecture modifications. (2/8)
Jun 1, 2021 9 tweets 4 min read
Can your NLP model handle noooisy mEsSy #realworldtext?

ByT5 works on raw UTF-8 bytes (no tokenization!), beats SoTA models on many popular tasks, and is more robust to noise.

📜 Preprint: arxiv.org/abs/2105.13626
💾 Code/Models: github.com/google-researc…

Summary thread ⬇️ (1/9) Image Tokenizers have many drawbacks:
- Finite, fixed vocabulary - often can't process new/unseen languages
- Lack of robustness to missspeling and n o i s e
- Not learned "end-to-end"
- Giant vocabulary matrices in the multilingual setting
- Lots of technical debt in practice

(2/9)
May 6, 2021 14 tweets 7 min read
The #ICLR2021 Workshop on Enormous Language Models (WELM) is tomorrow, May 7th!

Full info: welmworkshop.github.io
Livestream: welmworkshop.github.io/livestream/
gathertown info for ICLR registrants: iclr.cc/virtual/2021/w…

Thread summarizing the talks & panels ⬇️ (1/14) Our first talk will be by Thomas Margoni, who will provide some legal perspective on the use of web data for training large language models. He'll touch on topics like copyright law, rights, and licenses, as they pertain to training data for LMs. (2/14)
Dec 17, 2020 5 tweets 1 min read
I recently have had a number of aspiring ML researchers ask me how to stay on top of the paper onslaught. Here are three concrete tips:
1) Pick a tiny subfield to focus on
2) Skim
3) Rely on your community
Thread to explain ⬇️ (1/5) 1) Pick a tiny subfield to focus on
It's impossible to stay on top of "all of ML". It's a gigantic and diverse field. Being an effective researcher requires laser-focusing on a subfield. Pick a problem that is important, excites you, and you feel you could make progress on. (2/5)
Dec 12, 2019 11 tweets 4 min read
In case you missed our #neurips poster on MixMatch (arxiv.org/abs/1905.02249) today because you aren't in Vancouver or didn't survive the poster session stampede, here's the PDF: github.com/google-researc… and here's a transcript of what I said to everyone who came by: ⬇️ 1/11 The goal in semi-supervised learning (SSL) is to use unlabeled data to improve a model's performance. Many approaches do this by using the model to produce "label guesses" for unlabeled data, and then training the model to predict those guesses. 2/11
Oct 24, 2019 14 tweets 5 min read
New paper! We perform a systematic study of transfer learning for NLP using a unified text-to-text model, then push the limits to achieve SoTA on GLUE, SuperGLUE, CNN/DM, and SQuAD.
Paper: arxiv.org/abs/1910.10683
Code/models/data/etc: git.io/Je0cZ
Summary ⬇️ (1/14) Our approach casts *every* language problem as a text-to-text task. For example, English-to-German translation -- input: "translate English to German: That is good." target: "Das ist gut." or sentiment ID -- input: "sentiment: This movie is terrible!", target: "negative" (2/14)
Sep 19, 2019 6 tweets 2 min read
If you are reeling from a NeurIPS rejection or stressing about an ICLR submission, remember that some of the best papers were never published anywhere except arxiv. Thread of a few favorites (1/5): "Generating Sequences with RNNs" by Graves arxiv.org/abs/1308.0850 This paper blew my mind when it came out, showing that it was possible to generate plausible text and handwriting with RNNs. Includes the predecessors of attention, Adam, etc... (2/5)