Has anyone looked at how @DSPyOSS + GEPA could optimize inter-agent communication protocols in multi-agent systems?
Instead of optimizing individual prompts for task performance, you’d optimize the language that agents use to communicate with each other. 1/🧵
2/ Each DSPy signature becomes a communication interface, and GEPA optimizes:
3/ Information compression protocols - What’s the minimal information Agent A needs to convey to Agent B for effective coordination? GEPA could discover that certain verbose explanations are unnecessary, or that certain compact representations are more effective.
4/ Negotiation strategies - When agents disagree or have conflicting objectives, what communication patterns lead to better outcomes? This is different from prompt optimization - you’re optimizing the dialogue structure itself.
5/ Query routing efficiency - In a multi-agent system with specialists, GEPA could optimize how agents formulate requests to route to the right specialist, learning a shared vocabulary that maximizes routing accuracy.
6/ The metric would be end-to-end multi-agent task success, not individual prompt accuracy. This could discover emergent communication patterns that humans wouldn’t design.
Wouldn’t be surprised if @LakshyAAAgrawal has worked on this already?
I built an AI research agent that writes comprehensive reports with proper citations and optimizes its own prompts automatically - @LangChainAI + @ExaAILabs + @DSPyOSS + GEPA.
Link to blog post and full repo at the end. Here's how it works 🧵1/
2/ Most AI research systems have 3 problems:
- Prompts are static strings (can't be improved)
- Sequential execution (slow)
- Citation chaos (broken links, inconsistent numbering)
This system solves all three.
3/ The stack:
- LangGraph → workflow orchestration & parallelism
- DSPy → structured prompts as first-class objects
- GEPA → automatic prompt optimization
- Exa API → semantic search + full content retrieval
- Gemini → fast Flash/Pro models
Let SQL Be the Judge: Evolving an NL→SQL Generator with @DSPyOSS + GEPA (no labels required).
NL→SQL that self‑validates by executing its own output. No labels. Works on older GEPA via a scalar metric wrapper. Repo + blog below. 🧵1/12
2/13
Why: “vibes‑based evals” don’t ship. I want system‑level signals.
SQLite is the judge: if your query is safe, runs, and returns the right rows/shape, you win. GEPA evolves the program toward higher scores.
3/13
Setup: in‑memory SQLite with authors, books, sales. I pass an LM‑friendly schema string (TABLE …, EXAMPLE_ROWS …) to anchor column names and reduce hallucinations.
Practical @DSPyOSS example series: Build an LLM that self‑corrects instead of “RAG and pray.”
Pipeline: Retrieve → Generate → Verify → Refine.
If the verifier flags unsupported claims, we retry with feedback until it passes.
Blog post and GitHub link at the end. 1/13🧵
2/13
Why this matters:
- Hallucinations still slip through plain RAG
- Users deserve verifiable answers
- Programmatic verification ⇒ reliability you can ship
3/13
We’ll use @DSPyOSS + @OpenAIDevs + @Wikipedia:
- Retriever: Wikipedia summaries
- Generator: answers only from context
- Verifier: lists unsupported claims
- Refiner: retries until verifier says “None”
First in a series of practical GEPA + @DSPyOSS examples: Verifiable de‑identification (PII‑safe incident reports)
Most “privacy filters” are vibes. Let’s prove we removed PII while keeping the important bits intact. Link to blog post and repo ↓ 1/3🧵
Using dspy.GEPA, we evolve a prompt until:
- No PII leaks (emails, phones, names → placeholders), and
- Structure is preserved (must keep Root cause + Action items sections).
2/3🧵
GEPA takes textual feedback from a metric (not just a score) and rewrites the instructions for the DSPy module until constraints pass. It’s optimization‑as‑reasoning - no RL loops.
Hot take - Evolve prompts, not gradients: GEPA + DSPy > RL (for many pipelines). On 4 tasks, GEPA beat GRPO by ~10% on average (up to 20%) while using up to 35× fewer rollouts. That’s tailor‑made for small budgets.
More details ↓
Why it clicks in DSPy: your “student” is a declarative program. GEPA reads structured traces, proposes targeted instruction edits per module, keeps a Pareto frontier of complementary candidates, and can even merge the best modules across lineages.
Define a minimal DSPy module + metric with textual feedback, then compile with dspy.GEPA. GEPA consumes your feedback string (not just a scalar) to evolve prompts fast.