We just released an open-source framework that makes it easy to build visual reasoning agents (with GRPO).
github.com/groundlight/r1…
I’ve used it to train a tiny model to solve cryptograms (a task large pre-trained vision language models can’t solve), but you can use it to train on whatever task you want! Personally, I’d love it if someone trained a model that could solve mazes.