Introducing the first open-source optimized post-training losses in Liger Kernel with ~80% memory reduction, featuring DPO (@rm_rafailov), CPO (@fe1ixxu), ORPO (@jiwoohong98), SimPO (@yumeng0818), JSD, and more, achieving up to 70% end-to-end speedup through larger batch size. Use it as any PyTorch module - Available today in Liger v0.5.0!
github.com/linkedin/Liger…
(2/N) Installation and usage are simple. Just pip install liger-kernel and import the loss as a PyTorch module to benefit from significant memory reduction.
Aug 23, 2024 • 11 tweets • 4 min read
(1/n)
Training LLMs can be hindered by out-of-memory, scaling batch size, and seq length. Add one line to boost multi-GPU training throughput by 20% and reduce memory usage by 60%. Introducing Liger-Kernel: Efficient Triton Kernels for LLM Training.
github.com/linkedin/Liger…
(2/n) Our kernel integrates smoothly with Flash Attention, PyTorch FSDP, and DeepSpeed. Patch your Hugging Face model with one line, or compose your own model using the provided kernels. These kernels have minimal dependencies—just Torch and Triton.