How to get URL link on X (Twitter) App
https://twitter.com/ID_AA_Carmack/status/1587863190695813121)
https://twitter.com/_clashluke/status/1594284161841479687
https://twitter.com/_arohan_/status/1538291264226926597
https://twitter.com/OpenAI/status/1540032456559955968Let's start with their architectural description.
https://twitter.com/_clashluke/status/1463061191169822720
https://twitter.com/Buntworthy/status/1463905680004374535
https://twitter.com/ak92501/status/1439751096969334785This speedup is almost as significant as Switch Transformer's (arxiv.org/abs/2101.03961). It got up to 7x speedups using 64x as many (sparse) parameters.
https://twitter.com/ak92501/status/1419824931181846528With fewer parameters, layers, and lower training time, they achieve a 3.2% (relative) lower top-1 error.
https://twitter.com/ak92501/status/1414020174357934086To reasonably create these samples, I attempted to optimize the model by jitting it with TorchScript. After countless wrong attempts, it's finally 5x as fast as the baseline. (If you're using PyTorch, try JIT. You might want to follow my notebook for further optimizations.)
https://twitter.com/Hanxiao_6/status/1394742841033641985I'll implement it immediately in our GPT codebase and share its performance on 2B-equivalent models.