Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Vlad Lialin

@guitaricet

Mar 29, 2023 • 9 tweets • 6 min read • Read on X

Scrolly

How to RLHF #LLAMA if you don't have hundreds of GPUS? Do it in a parameter-efficient way.
I'm happy to finally share our parameter-efficient fine-tuning #PEFT survey! It took quite a bit more time to make than I expected, but I feel good about the result
arxiv.org/abs/2303.15647

PEFT methods can target several things: storage efficiency, multitask inference efficiency, and memory efficiency are among them. We are interested in the case of fine-tuning large models, so memory efficiency is a must.

We distill over 40 PEFT papers, provide a taxonomy and comparison of 30 methods, and describe 20 methods in detail (with pseudocode!).

I feel like everyone knows about Adapters, BitFit, and LoRa, but there are even better methods out there! In the last two years, low-rank methods took off.
Compacter and KronA use a more rank-efficient way to get large matrices. Kronecker product is the new matmul for PEFT.

We dive into the details of 20 different PEFT methods in the paper. Still, because we understand not everyone has the time to read the full 15 pages, we highlight a one-sentence description of each method and provide a pseudocode!

Finally, parameter efficiency is... ugh, complicated. Different people see it differently: number of trainable parameters, number of updated parameters, and rank of the update. Also, it seems like the larger models get, the fewer parameters you need to fine-tune them.

I hope this paper helps people learn more about PEFT and highlight some of the amazing methods I think have been overlooked.

@arumshisky

Joint work with @arumshisky and @VijetaDeshpande

@edwardjhu

@edwardjhu @neilhoulsby @KarimiRabeeh @blester125 @XiangLisaLi2 @ArmenAgha @liu_haokun @Yaqing_Wang @demi_guo_ @yuning_pro @yilin_sung @jiaao_chen Comments welcome!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @guitaricet

Vlad Lialin

@guitaricet

Jul 13, 2023

Parameter-efficient fine-tuning revolutionized the accessibility of LLM fine-tuning, but can they also revolutionize pre-training? We present ReLoRA — the first PEFT method that can be used for training from scratch! 🔥🔥

https://t.co/eVyRyJkgxGarxiv.org/abs/2307.05695
$ReLoRA achieves the same perplexity as full-rank training at a fraction of the number of trainable parameters.$

Why can't we use regular LoRA for pre-training? Because it only does optimization in a small low-rank subspace of the model parameters. It is enough for fine-tuning, but you don't want to have rank restrictions during pre-training.

What can we do? Apply LoRA multiple times in a row. It works because LoRA parameters can be integrated into the main network (W += W_A @ W_B) and because the sum of low-rank matrices can have a rank larger than their individual ranks. That's the first step towards ReLoRA.

Read 12 tweets

Vlad Lialin

@guitaricet

Mar 30, 2023

https://twitter.com/guitaricet/status/1641121969696391187

Yesterday we published our parameter-efficient fine-tuning survey. Let's go over some of the methods that we discuss in the paper!

https://twitter.com/guitaricet/status/1641121969696391187

@liu_haokun

We found that (AI)3 by @liu_haokun, Derek Tam, @Muqeeth10, and @colinraffel is one of the hidden gems of PEFT. It is simple, trains very few parameters, and outperforms strong methods like LoRa and Compacter. Let's quickly go over how it works.

(IA)3 is an additive PEFT method that adds three trainable vectors that rescale keys and values of attention and the hidden layer of the FCN in transformer. It is as simple as an element-wise product. A pseudocode is just 3 extra lines of code added to the transformer block.

Read 4 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Vlad Lialin

Try unrolling a thread yourself!

More from @guitaricet

Vlad Lialin

Vlad Lialin

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!