Tweet

Jeongsoo Park

@jespark0

Jun 20 • 4 tweets • 2 min read Twitter logo

@jcjohnss

Do we need RGB to train neural networks? We skip decoding JPEG to RGB, directly feed the encoded JPEG to ViT, and speed up train/eval by up to 39.2%/17.9% without accuracy loss!

Check out our poster on Thu-PM-165 in #CVPR2023! (work w/ @jcjohnss)

bit.ly/3qRwToV

JPEG slices images into patches. ViT works on patches. This makes it a perfect match for training from JPEG.

Data augmentation is vital for training a good-performing model. We directly augment JPEG to speed up training, instead of converting to RGB, augment, and converting it back.

Our ViT-Ti shows up to 39.2%/17.9% faster train/eval without accuracy loss compared to RGB. Also, our data augmentation pipeline is up to 93.2% faster than previous works. For more details, please check out our website!

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

Share this page!

Enter Twitter Thread URL to Unroll

Jeongsoo Park

Try unrolling a thread yourself!

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!