Governing Multi-Agent Systems: Agent Identity, A2A, and the Agent Gateway

Conçu pour la vitesse : latence d'environ 10 ms, même en cas de charge
Une méthode incroyablement rapide pour créer, suivre et déployer vos modèles !
- Gère plus de 350 RPS sur un seul processeur virtuel, aucun réglage n'est nécessaire
- Prêt pour la production avec un support complet pour les entreprises
Single-agent systems call models and tools. Multi-agent systems add something new: agents calling agents. That east-west traffic — an orchestrator delegating to sub-agents, agents handing work to each other over the still-young Agent2Agent protocol — is where cost runs away, blast radius widens, and "which agent did what" becomes unanswerable. The protocols standardize how agents discover one another and exchange work, and they provide security hooks — but they don’t prescribe your enterprise identity model, policy graph, budget model, or observability and control plane. This post is that governance layer, and why it belongs at the gateway.
Tomás, a platform engineer, walked in to a cost alert and a mystery. Overnight, the company's new multi-agent research workflow had spent more than its entire previous month. An orchestrator agent delegated subtasks to a set of sub-agents; one sub-agent, hitting a transient error, retried by re-invoking the orchestrator, which delegated again — a loop that ran for hours. By morning the agents had called each other tens of thousands of times. Tomás wanted to know which agent started it and where the cycle formed, and found he couldn't: every agent authenticated with the same shared service key, the calls between agents weren't recorded as a graph, and there was no per-agent rate limit that would have tripped. The system had governance for calls to the model provider. It had almost none for calls between its own agents.
This is the gap multi-agent systems open. The moment agents start delegating to one another, you have a new internal network — one with no identity, no policy, and no trace by default. The agent frameworks help you build the workflow; they don't govern it. This post is how to give that internal network the same identity, limits, and observability you'd never run a microservice mesh without.
1. The New Traffic Pattern: Agents Calling Agents
For most of the gateway story so far, traffic has been north-south: an application calls a model, maybe through a tool. Multi-agent systems add east-west traffic — agents invoking other agents. An orchestrator delegates to specialists; a specialist consults another; results flow back up. The still-young Agent2Agent (A2A) protocol gives this a standard shape, with agents publishing capability descriptions (agent cards) that others discover, and exchanging tasks and messages over a common interface, much as MCP standardized how agents reach tools.
The analogy worth holding onto is the move from a monolith to microservices. The instant your agents talk to each other, you have a distributed system with the failure modes of one: cascading retries, cycles, fan-out amplification, and the loss of a single clear call stack. And like microservices, the answer isn't to wish the calls away but to put them behind a layer that gives every caller an identity, every call a policy, and every flow a trace. That layer, for agents, is the agent gateway.

2. Agent Identity: Why a Shared Service Key Isn't Enough
Tomás's root problem was identity. When every agent authenticates with one shared service key, the system literally cannot tell its agents apart — which means it can't authorize them differently, can't attribute cost to them separately, and can't reconstruct which one acted.
The fix is to give each agent its own identity, issued and verified at the gateway, and to propagate it on every call the agent makes — to a model, to a tool, and to another agent. That identity is what every later control hangs off: authorization decisions, rate limits, cost attribution, and trace attribution all key on "which agent."
Each agent carries its own identity on every hop (illustrative)
# The gateway issues and verifies a per-agent identity, not a shared key.
ctx = AgentContext(
agent_id="agent:research", # this agent's own identity
on_behalf_of="user:tomas", # the human principal, preserved end-to-end
run_id="run_4f9c", # correlates every hop of one workflow
depth=2, # how deep in the delegation chain we are
)
# Propagated when this agent delegates to another agent or calls a tool:
gateway.invoke(target="agent:writer", context=ctx, payload=task)Centralizing identity at TrueFoundry's Agent Gateway — which manages authentication, identity, and service-account management for agents at the gateway layer — means the identity is established once and trusted everywhere downstream, rather than each agent framework inventing its own scheme. Preserving the human principal (on whose behalf the workflow runs) alongside the agent identity is what keeps end-user authorization and audit intact even three delegations deep.
3. A2A Authorization and Policy: Which Agent May Invoke Which
Identity enables authorization, and the questions are concrete in a multi-agent system. May the research agent invoke the writer agent, or only the orchestrator? May a sub-agent call external tools directly, or only through its parent? Which agents may spend against which budget? Expressing these as policy-as-code — the same Cedar or OPA approach from the governance and routing posts — turns the agent graph's allowed edges into something explicit and reviewable rather than implicit in code.
Per-agent authorization for east-west calls (illustrative policy)
# Default-deny: an agent may only invoke agents it is explicitly allowed to.
allow if principal.agent_id == "agent:orchestrator"
and action == "invoke"
and resource.agent_id in ["agent:research", "agent:writer", "agent:critic"]
# Sub-agents may NOT invoke the orchestrator — this edge is what created the loop.
deny if principal.agent_id in ["agent:research", "agent:writer"]
and resource.agent_id == "agent:orchestrator"
# Only the research agent may reach external search tools.
allow if principal.agent_id == "agent:research"
and resource.kind == "mcp_tool"
and resource.name == "web_search"Notice the second rule: a policy that forbids sub-agents from re-invoking the orchestrator would have cut Tomás's loop at the first hop, independent of any rate limit. Authorization isn't only a security control here; constraining the shape of the agent graph is also how you prevent whole classes of runaway behavior. The gateway becomes the enforcement point when it's the one place every east-west call is routed through.
It helps to be precise about what the protocols decide and what they leave to you. Discovery and transport are standardized; the identity model, policy, budgets, and enforcement point are not:
4. Containing the Blast Radius: Loops, Runaway Fan-Out, and Rate Limits
Even with good authorization, multi-agent systems fail in ways single calls don't, because the unit of damage is the cascade. A retry that re-delegates can form a cycle; an agent that fans out to many children can amplify one request into thousands; a slow sub-agent can stall a whole workflow. These are the agent-scale version of the thundering-herd and silent-escalation problems familiar from routing and failover at the model layer.
Containment is layered. A delegation-depth limit caps how deep the chain can recurse, breaking cycles structurally. Per-agent rate limits cap how often any one agent can invoke others, so a loop trips a ceiling instead of running all night. Timeouts and stall detection stop an agent waiting forever on a child. And a global per-run budget caps the total spend of one workflow regardless of its shape. TrueFoundry's Agent Gateway documents the relevant primitives — retry policies, fallback paths, timeouts and safeguards against infinite loops or stalled agents, plus token- and cost-based quotas per agent, workflow, or environment. The exact configuration shape below is illustrative; the primitives are what the product page describes.
Blast-radius controls for a multi-agent run (illustrative gateway config)
run_limits:
max_delegation_depth: 5 # breaks cycles structurally
max_total_tokens: 500000 # whole-run budget, force-stop past this
max_wall_clock_seconds: 600
per_agent:
invoke_rate_limit: 60/min # one agent can't call others without bound
timeout_seconds: 45 # stall detection on a child call
on_breach: halt_and_alert # stop the run, page a humanThe shift in mindset is to treat a multi-agent run as a bounded transaction with a budget and a depth, not an open-ended conversation. With those bounds enforced at the gateway, Tomás's overnight loop becomes a tripped limit and an alert at 2am instead of a five-figure invoice at 9am.
5. Observability: Tracing a Multi-Agent Run End-to-End
Per-request metrics — latency, tokens, errors on each individual call — are necessary but not sufficient for multi-agent systems, because they lose the thing you most need: the shape of the run. When something goes wrong three delegations deep, you need the whole tree — which agent called which, in what order, with what inputs and outputs, and where the cost accrued. That's an end-to-end trace spanning agent steps, model calls, and tool invocations, stitched together by the run identifier that every hop carries.

This builds directly on the tracing from our OpenTelemetry post: the same span model, with the agent as a first-class dimension and the run as the trace that ties spans together. TrueFoundry's Agent Gateway captures these end-to-end execution traces and lets you inspect the per-step logs to diagnose failures — turning "the agents spent too much last night" into "this edge formed a cycle at depth four," which is the difference between a mystery and a fix.
6. Cost Attribution by Agent Identity
Cost in a multi-agent system is meaningless without identity. "The workflow cost X" doesn't tell you whether the spend is the orchestrator's planning calls, one sub-agent's expensive model choice, or a loop. Attributing tokens and cost to the specific agent, workflow, and run — keyed on the identity from section 2 — is what makes the spend legible and the runaway diagnosable.
This is the cost-attribution post's per-team accounting extended to the agent as the unit. The Agent Gateway attributes token usage and cost to specific agents, workflows, teams, and environments, which does double duty: it answers the finance question (which agent drives spend) and it surfaces the operational anomaly (a single agent's cost spiking is often the first visible sign of a loop, well before the monthly bill). Pair it with the per-run budget from section 4 and cost becomes both observable and bounded.
7. Security: How Prompt Injection Propagates Across Agents
Multi-agent systems give prompt injection a new way to travel. As covered in our prompt-injection post, an agent that reads untrusted content — a retrieved document, a tool result — can be steered by instructions hidden in it. In a multi-agent system, that compromised agent then talks to other agents, and its output becomes their input. An injection that lands on the research agent can propagate to the writer and critic agents downstream, because to them the research agent is a trusted peer, not an untrusted source.
TrueFoundry AI Gateway offre une latence d'environ 3 à 4 ms, gère plus de 350 RPS sur 1 processeur virtuel, évolue horizontalement facilement et est prête pour la production, tandis que LiteLM souffre d'une latence élevée, peine à dépasser un RPS modéré, ne dispose pas d'une mise à l'échelle intégrée et convient parfaitement aux charges de travail légères ou aux prototypes.
Le moyen le plus rapide de créer, de gérer et de faire évoluer votre IA














.webp)

.webp)
.webp)
.webp)
.webp)
.webp)
.webp)

.webp)

.webp)







