Jeongsoo Park Profile picture
PhD student @UMichCSE
Jun 20, 2023 4 tweets 2 min read
Do we need RGB to train neural networks? We skip decoding JPEG to RGB, directly feed the encoded JPEG to ViT, and speed up train/eval by up to 39.2%/17.9% without accuracy loss!

Check out our poster on Thu-PM-165 in #CVPR2023! (work w/ @jcjohnss)

bit.ly/3qRwToV JPEG slices images into patches. ViT works on patches. This makes it a perfect match for training from JPEG. Image