Shubham Sharma Profile picture
likes to reason with humans | lives at @babayagalabs | IIT Bombay '23
May 19 6 tweets 2 min read
open sourcing Marlin-2B 🐟
a tiny VLM to extract structured information from videos

Marlin is finetuned for two questions devs want to ask in their videos: what is happening, and when?

Best open model in its weight class, competitive with Gemini-2.5-flash at only 2B params 🧵 Marlin was trained on two modes:

1. marlin.caption() returns a structured Scene + Events JSON with second-precise timestamps.

You can use it to caption ig reels, index a video library or give your agent context of what happened and when in a video feedImage