Introducing Latent Briefing, a way for agents to quickly share their relevant memory directly. Result: 31% fewer tokens used, same accuracy.
Multi-agent systems are powerful, but can be wildly inefficient. They pass context as tokens, so costs explode and signal gets lost. We built an algorithm that allows agents to communicate KV cache to KV cache.
Agents need to share context, but doing it in token space has real tradeoffs:
• LLM summaries: slow (20–60s), lossy, and often miss what the next agent actually needs
• RAG: splits context into chunks, so relationships across documents get lost
• Passing full context: expensive, noisy, and often hurts accuracy
Our method skips tokens entirely. We operate on the KV cache, using the worker's own attention patterns to extract what's relevant from the orchestrator's memory and discard the rest.