How to get URL link on X (Twitter) App
https://twitter.com/DeepMind/status/1360217173797568514AGC is based on the idea that the magnitude of a gradient shouldn't be too large relative to the magnitude of a parameter. It's strongly related to normalized optimizers like LARS or LAMB, but operates per-output-unit and is relaxed in that it doesn't ignore gradient magnitudes.