Governing Multi-Agent Systems: A2A Traffic at the Gateway

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Single-agent systems call models and tools. Multi-agent systems add something new: agents calling agents. That east-west traffic — an orchestrator delegating to sub-agents, agents handing work to each other over the still-young Agent2Agent protocol — is where cost runs away, blast radius widens, and "which agent did what" becomes unanswerable. The protocols standardize how agents discover one another and exchange work, and they provide security hooks — but they don’t prescribe your enterprise identity model, policy graph, budget model, or observability and control plane. This post is that governance layer, and why it belongs at the gateway.

Key Takeaways

Multi-agent systems introduce a new traffic pattern — east-west, agents calling agents and agents calling tools — distinct from the north-south app-to-model traffic gateways were first built to manage.
The Agent2Agent (A2A) protocol standardizes agent-to-agent communication — capability discovery via agent cards, task and message exchange, and security hooks like declared auth schemes — but, like MCP, it defines the mechanism and leaves your enterprise identity model, policy graph, budgets, and control plane to you.
A shared service key is the wrong identity model. When one credential fronts many agents, you can't authorize per agent, attribute cost per agent, or reconstruct which agent took which action. Agents need their own identities.
The dominant failure mode is blast radius, not a single bad call: loops and runaway fan-out, where one agent calls another in a cycle, burn budget fast and silently. Depth limits, per-agent rate limits, and timeouts contain it.
Observability must span the whole run — an end-to-end trace across agent steps, model calls, and tool invocations — because per-request metrics lose the shape of a multi-agent workflow and hide where it went wrong.
Prompt injection propagates across agents: a poisoned input or tool result read by one agent can steer the agents it delegates to, so injection defense is an agent-to-agent concern, not only an input-boundary one.
The gateway is the agent control plane. TrueFoundry's Agent Gateway gives agents identity, per-agent RBAC and budgets, retries, timeouts and loop safeguards, and end-to-end tracing — unifying model, MCP-tool, and agent-to-agent governance in one place.

Tomás, a platform engineer, walked in to a cost alert and a mystery. Overnight, the company's new multi-agent research workflow had spent more than its entire previous month. An orchestrator agent delegated subtasks to a set of sub-agents; one sub-agent, hitting a transient error, retried by re-invoking the orchestrator, which delegated again — a loop that ran for hours. By morning the agents had called each other tens of thousands of times. Tomás wanted to know which agent started it and where the cycle formed, and found he couldn't: every agent authenticated with the same shared service key, the calls between agents weren't recorded as a graph, and there was no per-agent rate limit that would have tripped. The system had governance for calls to the model provider. It had almost none for calls between its own agents.

This is the gap multi-agent systems open. The moment agents start delegating to one another, you have a new internal network — one with no identity, no policy, and no trace by default. The agent frameworks help you build the workflow; they don't govern it. This post is how to give that internal network the same identity, limits, and observability you'd never run a microservice mesh without.

1. The New Traffic Pattern: Agents Calling Agents

For most of the gateway story so far, traffic has been north-south: an application calls a model, maybe through a tool. Multi-agent systems add east-west traffic — agents invoking other agents. An orchestrator delegates to specialists; a specialist consults another; results flow back up. The still-young Agent2Agent (A2A) protocol gives this a standard shape, with agents publishing capability descriptions (agent cards) that others discover, and exchanging tasks and messages over a common interface, much as MCP standardized how agents reach tools.

The analogy worth holding onto is the move from a monolith to microservices. The instant your agents talk to each other, you have a distributed system with the failure modes of one: cascading retries, cycles, fan-out amplification, and the loss of a single clear call stack. And like microservices, the answer isn't to wish the calls away but to put them behind a layer that gives every caller an identity, every call a policy, and every flow a trace. That layer, for agents, is the agent gateway.

‍

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now