Thread by @tech_optimist on Thread Reader App

Just published my next @DSPyOSS blog post! I explored the performance of two optimizers: Bootstrap fewshot and GEPA 🔥on an information extraction task. A small model like gemini-2.5-flash-lite does *really* well after optimization (esp. with GEPA)

1/9 thedataquarry.com/blog/learning-…

It's clear that GEPA has huge potential in improving prompts (and maybe even full programs) for nearly every kind of task. Bootstrap fewshot optimizers are a decent start to improve results, but you can only go so far with adding fewshot examples. GEPA goes *much* deeper.

2/9

First, optimization in @DSPyOSS is more more akin to the process of *compilation* in programming languages. DSPy's goal is tCo translate high-level instructions (signatures + modules) to a lower level that the LM can work with (i.e., weights).

Optimization = compilation

3/9

In choosing an optimizer, it's useful to understand the "surface area" of what's being optimized in the prompt. Automatic fewshot optimizers target *only* the user/assistant messages , and do NOT modify the instructions (unlike GEPA).

4/9

GEPA is designed to operate on *any* textual traces, starting with the user instructions in the prompt. The example here shows how the baseline instructions (terse, little detail) is improved by GEPA to discover good ways of phrasing details (extractor module).

5/9

The GEPA paper is worth reading many times - in a nutshell, GEPA works so well because it exploits the biggest strengths of LMs - their ability to generate & reason over natural language (instructions, error msgs, reasoning traces, etc.) without relying on gradients

6/9

The key part of creating a GEPA optimization workflow in @DSPyOSS is in incorporating feedback (alongside the score) as part of the metric function. The feedback is a natural language trace that's used by the reflection LM in GEPA to propose new instructions.

7/9

Key lesson learned: It's always worth spending time on curating *high-quality* training, validation and test examples for GEPA (or any other optimizer). The *distribution* of examples matters, as does the train/val/test splits. Similar to working in traditional ML.

8/9

I had a LOT of fun describing the details in the blog post. I know I'm only scratching the surface of what's possible with optimization in @DSPyOSS, and the future looks bright! Looking forward to writing about it in many other use cases. 😁

9/9

thedataquarry.com/blog/learning-…

Share this Scrolly Tale with your friends.

A Scrolly Tale is a new way to read Twitter threads with a more visually immersive experience.
Discover more beautiful Scrolly Tales like this.

Share this page!

Enter URL or ID to Unroll