AI PhD @ UC Berkeley | Past: AI4Code Research Fellow @MSFTResearch | Summer @EPFL | Maintainer of https://t.co/LQamOaRksn | Hobbyist Saxophonist
Jul 28 • 5 tweets • 3 min read
How does prompt optimization compare to RL algos like GRPO?
GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't.
Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
We implemented GEPA as a new @DSPyOSS optimizer (release soon!). This means that it works for even sophisticated agents or compound systems you've already implemented.
GEPA outperforms the MIPROv2 optimizer by as much as 11% across 4 tasks for Qwen3 and GPT-4.1-mini.
Of course: Weight updates remain necessary to teach the models completely new tasks and still excel at general-purpose (massively multi-task!) post-training!
However, we show that for specialization to downstream systems, reflective prompt optimization can go really far with tiny data sizes and rollout budgets!