Artidoro Pagnoni Profile picture
PhD student in NLP at UW with Luke Zettlemoyer
May 24, 2023 8 tweets 4 min read
4-bit QLoRA is here to equalize the playing field for LLM exploration. You can now fine-tune a state-of-the-art 65B chatbot on one GPU in 24h.

Paper: arxiv.org/abs/2305.14314
Code and Demo: github.com/artidoro/qlora QLoRA uses a frozen 4-bit base model with adapters. We backpropagate through the 4-bit weights into the adapters. QLoRA incorporates the NF4 datatype, double-quantization, and paged optimizers. We show it is on par with 16-bit finetuning at a fraction of the memory footprint. Image