Raja Patnaik Profile picture
Head of AI (Research) @ Asset Manager. Quant PhD. Financial economist. Engineer. Tech and science enthusiast. Angel investor. All opinions are my own.
Oct 27 8 tweets 2 min read
Has anyone looked at how @DSPyOSS + GEPA could optimize inter-agent communication protocols in multi-agent systems?

Instead of optimizing individual prompts for task performance, you’d optimize the language that agents use to communicate with each other. 1/🧵 Image 2/ Each DSPy signature becomes a communication interface, and GEPA optimizes:
Oct 24 25 tweets 6 min read
I built an AI research agent that writes comprehensive reports with proper citations and optimizes its own prompts automatically - @LangChainAI + @ExaAILabs + @DSPyOSS + GEPA.

Link to blog post and full repo at the end. Here's how it works 🧵1/ 2/ Most AI research systems have 3 problems:

- Prompts are static strings (can't be improved)
- Sequential execution (slow)
- Citation chaos (broken links, inconsistent numbering)

This system solves all three.
Oct 21 13 tweets 3 min read
Let SQL Be the Judge: Evolving an NL→SQL Generator with @DSPyOSS + GEPA (no labels required).

NL→SQL that self‑validates by executing its own output. No labels. Works on older GEPA via a scalar metric wrapper. Repo + blog below. 🧵1/12 Image 2/13
Why: “vibes‑based evals” don’t ship. I want system‑level signals.

SQLite is the judge: if your query is safe, runs, and returns the right rows/shape, you win. GEPA evolves the program toward higher scores.
Oct 17 16 tweets 4 min read
Practical @DSPyOSS example series: Build an LLM that self‑corrects instead of “RAG and pray.”

Pipeline: Retrieve → Generate → Verify → Refine.

If the verifier flags unsupported claims, we retry with feedback until it passes.

Blog post and GitHub link at the end. 1/13🧵 Image 2/13
Why this matters:
- Hallucinations still slip through plain RAG
- Users deserve verifiable answers
- Programmatic verification ⇒ reliability you can ship
Oct 15 5 tweets 1 min read
First in a series of practical GEPA + @DSPyOSS examples: Verifiable de‑identification (PII‑safe incident reports)

Most “privacy filters” are vibes. Let’s prove we removed PII while keeping the important bits intact. Link to blog post and repo ↓ 1/3🧵 Image Using dspy.GEPA, we evolve a prompt until:

- No PII leaks (emails, phones, names → placeholders), and
- Structure is preserved (must keep Root cause + Action items sections).

2/3🧵
Oct 13 13 tweets 4 min read
Prompt engineering is brittle. Change your model? Rewrite all your prompts. Add a new feature? Pray that your carefully crafted examples still work.

@DSPyOSS solves all of this: program your models instead of prompting them.

Unsurprisingly, 28k+ GitHub stars: 🧵1/12↓ DSPy separates interface from implementation.

You define WHAT you want (signatures), HOW to structure it (modules), and let optimizers figure out the best prompts automatically.

Think: type hints + composable functions + auto-optimization. 🧵2/12
Sep 9 7 tweets 2 min read
Hot take - Evolve prompts, not gradients: GEPA + DSPy > RL (for many pipelines). On 4 tasks, GEPA beat GRPO by ~10% on average (up to 20%) while using up to 35× fewer rollouts. That’s tailor‑made for small budgets.

More details ↓ Image Why it clicks in DSPy: your “student” is a declarative program. GEPA reads structured traces, proposes targeted instruction edits per module, keeps a Pareto frontier of complementary candidates, and can even merge the best modules across lineages.