William Berrios Profile picture
Technical staff @ContextualAI. Past: collaborator @huggingface, @Artificio_Org @MIT_CBMM, @evl_uic. 🦙 Engineer @UNIoficial 🇵🇪
Jun 23 11 tweets 4 min read
Excited to share 🤯 that our LMUnit models with @ContextualAI just claimed the top spots on RewardBench2 🥇

How did we manage to rank +5% higher than models like Gemini, Claude 4, and GPT4.1? More in the details below:

🧵 1/11 Image As a quick recap, in LMUnit, we utilize "natural language unit tests," which decompose response quality into explicit, testable criteria.

Instead of relying on opaque metrics like "pick the better response," each quality aspect becomes a specific question that humans can consistently answer

2/11Image