Royi Rassin Profile picture
NLP & ML MSc student @biunlp
Nov 7 12 tweets 4 min read
How diverse are the outputs of text-to-image models and how can we measure that? In our new work, we propose a measure based on LLMs and Visual-QA (VQA), and show NONE of the 12 models we experiment with are diverse. 🧵
1/11 Image Project page (with interactive results to explore): royira.github.io/GRADE/
Code: github.com/RoyiRa/GRADE-Q…
HF: huggingface.co/papers/2410.22…
arXiv: arxiv.org/abs/2410.22592
Oct 20, 2022 8 tweets 4 min read
New work! :D

We show evidence that DALL-E 2, in stark contrast to humans, does not respect the constraint that each word has a single role in its visual interpretation.

Work with @ravfogel and @yoav.
BlackboxNLP @ #emnlp2022

Below, "a person is hearing a bat" Image We detail three types of behaviors that are inconsistent with the single role per word constraint