How to get URL link on X (Twitter) App
GitHub:
[2/7] We first study the common technique of post-train quantizing model weights, finding that the longer you train/the more data seen during pretraining, the more sensitive the model becomes to quantization at inference-time, explaining why Llama-3 may be harder to quantize.