Quick Thread 🧵(1/7)
Fine-tuning all the parameters of large pre-trained models works well and is the core of many SotA NLP results right now, but has some sharp edges. The size can make them difficult to work with and serve, plus each fine-tuning run creates a fork. (2/7)