a really exciting new account
"advanced pytorch user" - @cHHillee
alt: @typedalt
Jan 6, 2023 • 4 tweets • 1 min read
does causal attention annoy anyone else? you compute a whole matrix multiplication only to throw half of it away!
isn't it also deceiving when making claims about how well transformers utilize GPUs?