Agent2Agent (A2A) is a new open protocol that lets AI agents securely collaborate across ecosystems regardless of framework or vendor.
Here is all you need to know:
Universal agent interoperability
A2A allows agents to communicate, discover each other’s capabilities, negotiate tasks, and collaborate even if built on different platforms. This enables complex enterprise workflows to be handled by a team of specialized agents.
Built for enterprise needs
The protocol supports long-running tasks (e.g., supply chain planning), multimodal collaboration (text, audio, video), and secure identity/auth flows (matching OpenAPI-grade auth). Agents share JSON-based “Agent Cards” for capability discovery, negotiate UI formats, and sync task state with real-time updates.
- Llama 4 Scout & Maverick are up for download
- Llama 4 Behemoth (preview)
- Advanced problem solving & multilingual
- Support long context up to 10M tokens
- Great for multimodal apps & agents
- Image grounding
- Top performance at the lowest cost
- Can be served within $0.19-$0.49/M tokens
LMArena ELO score vs. cost
"To deliver a user experience with a decode latency of 30ms for each token after a one-time 350ms prefill latency, we estimate that the model can be served within a range of $0.19-$0.49 per million tokens (3:1 blend)"
It's great to see native multimodal support for Llama 4.
If you develop seriously with LLMs and are building complex agentic flows, you don't need convincing about this.
I've built the most comprehensive, up-to-date course on prompting LLMs, including reasoning LLMs.
4 hours of content! All Python!
Check it out if you're building AI Agents or RAG systems -- prompting tips, emerging use cases, advanced prompting techniques, enhancing LLM reliability, and much more.
All code examples use pure Python and the OpenAI SDKs. That's it!
This course is for devs and AI engineers looking for a proper overview of LLM design patterns and prompting best practices.
We offer support, a forum, and live office hours too.
DM me for discount options. Students & teams also get special discounts.
Can you cut the fine-tuning costs of an LLM by 75% and keep strong reasoning performance?
A new paper from the Tencent AI Lab claims that it might just be possible.
Let's find out how:
The First Few Tokens
It shows that all you need is a tiny prefix to improve your model’s reasoning—no labels or massive datasets are required!
Uses an unsupervised prefix fine-tuning method (UPFT)—only requiring prefix substrings (as few as 8 tokens) of generated solutions.
Task template for Prefix Tuning
They use a simple task template for prefix tuning. By using a few leading tokens of the solution, the model learns a consistent starting approach without requiring complete, correct final answers. Other approaches require entire reasoning traces.