Google released PaliGemma models! They are open vision-language model inspired by PaLI-3 and built with open components such as the SigLIP vision model and the Gemma language model.
This space includes models fine-tuned on a mix of downstream tasks, inferred via @huggingface 🤗