How to get URL link on X (Twitter) App
Here's ★how to make small batch LLM training fast, ★how to pretrain LLMs efficiently via vanilla SGD without momentum, and ★why you should consider getting rid of LoRA and gradient accumulation. 2/n
https://twitter.com/ylecun/status/1591463668612730880While it is true that a model which can only express few functions needs few samples to learn, the converse is not true! This underscores the failure of ideas like VC dimension and Rademacher complexity to explain neural network generalization. 2/8