I’ve used it to train a tiny model to solve cryptograms (a task large pre-trained vision language models can’t solve), but you can use it to train on whatever task you want! Personally, I’d love it if someone trained a model that could solve mazes.
My collaborators and I published a blog post with our initial learnings from this project where we discuss our learnings, including critical insights for reward design: groundlight.ai/blog/visual-re…
Thank you to my collaborators for their contributions to this project: @leopd and @BowenROIM and a shoutout to @willccbb for his work on verifiers, which really improved the ergonomics of the code.
Happy to answer any questions here (just reply!) or on GitHub.
@leopd @BowenROIM @willccbb PS: we’re working on multi turn conversations and tool use. Stay tooned!
• • •
Missing some Tweet in this thread? You can try to
force a refresh