How to get URL link on X (Twitter) App
https://twitter.com/arankomatsuzaki/status/1776057023697731913Learns a (R,W,b) per layer and _per position_ in the (prompt) sequence.
https://x.com/giffmana/status/1669840989853196292?s=20
https://twitter.com/arankomatsuzaki/status/16314696830554030081. As minibatch size grows/shrinks, the effect should vanishes/increase.
https://twitter.com/__kolesnikov__/status/1626546150579879936- In pix2seq, you don't _really_ care about perplexity of the detection string
https://twitter.com/y0b1byte/status/14812833512815534172/3 I've tried fancy multi-task methods almost every year, but they never outperformed my well-tuned "just add the losses". I never thought much of it, but this paper actually explores both theoretically and empirically why that is!