How to get URL link on X (Twitter) App
https://twitter.com/ak92501/status/1530007802013417486Toward more descriptive and distinctive caption generation, we propose using CLIP to calculate multimodal similarity and use it as a reward function. This avoids imitating only the reference caption and instead transfers fine-grained details from similar training images.