Tweet

Brian Lester

3 Sep, 7 tweets, 3 min read

@GoogleAI

My first @GoogleAI residency project was accepted to @emnlpmeeting #EMNLP2021!

Prompt Tuning can condition a frozen T5 XXL model to perform new tasks while only adding 0.003% more parameters and no performance loss.

Camera Ready 📸: arxiv.org/abs/2104.08691

Quick Thread 🧵(1/7)

Fine-tuning all the parameters of large pre-trained models works well and is the core of many SotA NLP results right now, but has some sharp edges. The size can make them difficult to work with and serve, plus each fine-tuning run creates a fork. (2/7)

Prompt Tuning, learning a small set of parameters that are prepended to the embedded input, can eliminate these problems. Freezing pre-trained models enables mixed-task batching and efficient ensembling, without the need for multiple copies. (3/7)

The size of the pre-trained model is critical to Prompt Tuning performance. As we scale T5 from Small to XXL, we see it closes the gap with full fine-tuning. (4/7)

By keeping the pretrained model frozen, Prompt Tuning avoids overfitting to a specific task, improving performance on domain shift problems. See our paper for details, as well as comparison with other recent “P*-Tuning” approaches. (5/7)

An interesting quirk of Prompt Tuning is that the hyper parameters are a little strange. For example, our learning rate is 0.3, two orders of magnitude larger than the default T5 learning rate of 0.001. (6/7)

@noahconst

A huge shout out to my amazing mentors, @noahconst and @aboSamoor, who were a big part of making this project possible. (7/7)

• • •

Missing some Tweet in this thread? You can try to force a refresh

Share this page!

Brian Lester

Try unrolling a thread yourself!

Did Thread Reader help you today?

Like this author's thread?