Enterprise AI Governance: Virtual Keys, RBAC & Audit

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

When LLMs move from a pilot to production across many teams, governance stops being optional. Someone will ask who can call which model, with what budget, on whose data — and whether every call can be reconstructed for an audit. This post is the control plane that answers those questions: virtual keys that decouple access from provider credentials, RBAC and policy-as-code, budgets and quotas as governance, compliance-grade audit logs, data residency, and how these gateway controls map to obligations like the EU AI Act — framed as what they help satisfy, not as a compliance guarantee.

Key Takeaways

Governance becomes non-negotiable when LLMs go multi-team and multi-model. Without a control plane you get shared API keys with no attribution, no spend limits, no audit trail, and shadow AI — exactly what an auditor or a budget owner will eventually ask about.
Virtual keys decouple access from provider credentials: per-team or per-app keys that map to real provider keys at the gateway, so you can attribute usage, revoke access without rotating provider keys, and scope what each key can reach.
RBAC and policy-as-code (Cedar/OPA) answer "who can call what" — which teams may use the frontier model, which routes, which tools, and which providers a given data class may touch.
Budgets, quotas, and rate limits are governance and fairness controls, not just abuse protection: hard and soft limits per team, alerting, and enforcement when a limit is exceeded.
A compliance-grade audit log is immutable, complete (who, what, when, which model, which data category), tamper-evident, and exportable to a SIEM — and it logs metadata and redaction events, never raw PII.
Data residency and sovereignty are routing decisions: region-aware routing, blocking certain providers for certain data classes, and self-hosted models for regulated data.
The gateway is the single control plane for keys, RBAC, budgets, policy, audit, and residency. TrueFoundry's AI Gateway provides these as the layer every request already passes through — it helps satisfy logging and oversight obligations, but it is a control, not a compliance certification.

Quarter-end at Northwind. Mei, the platform lead, got a question from the security and compliance team she couldn't answer: which teams had sent customer data to which model providers over the last quarter, and could she produce the records. She couldn't. Every service called the model providers through one shared API key, checked into a config years ago. There was no per-team attribution, no record of which requests carried customer data, no way to revoke one team's access without rotating the key for everyone, and no audit trail beyond the providers' own opaque billing. The LLM usage had grown from one prototype to a dozen production services, and the governance had not grown with it.

Nothing had gone wrong, exactly — no breach, no overspend anyone had caught. But "we can't answer the question" is its own finding, and it's the one that turns a routine audit into a project. This post is the control plane that makes the question answerable before someone asks it.

What TrueFoundry's AI Gateway Provides Here

Everything in this post — virtual keys, RBAC, budgets, rate limits, audit logs, residency rules, and guardrails as enforced policy — is something TrueFoundry's AI Gateway expresses as configuration in one control plane. Access control defines who (users, teams, virtual accounts) may call which provider accounts and models; Personal Access Tokens and Virtual Account Tokens are how applications authenticate to the gateway instead of holding raw provider keys; rate-limit and budget configs apply per user, team, virtual account, model, or any custom metadata key; and guardrails — including Cedar and OPA as policy-as-code at the MCP-tool boundary — run as enforced rules at four lifecycle hooks.

Every request crosses the same path: authenticate, resolve the calling identity, evaluate access policy and per-key budgets, evaluate rate-limit rules in order (first match wins), run input guardrails, route to a provider, emit an audit-grade trace, then run output guardrails. The same view becomes the record an auditor needs: who called what, when, against which policy, with which guardrail outcomes. Request Traces and OpenTelemetry export let the trail land in your SIEM rather than a vendor dashboard you cannot query.

‍

TrueFoundry AI Gateway request flow — Fig 1: *How a request flows through the gateway in production: validation → identity → rate/budget checks → load balancing → provider adapter → async logging. Source:* *TrueFoundry — Gateway Plane Architecture*.

‍

Fig 2: *How identity, policy, and audit compose on a single request. Each stage is gateway configuration, and each decision is recorded against the same trace ID.*

The application code is unchanged from any OpenAI-style call — the governance is in the bearer token and the metadata header, not in client logic. A Personal Access Token resolves to a user; a Virtual Account Token resolves to a non-human identity for production services. The X-TFY-METADATA header carries the structured fields (team, project, cost_center, environment) that policies, budgets, and audit logs match against:

Calling the gateway with an identity and audit metadata (Python, OpenAI-compatible)

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-truefoundry-gateway-url>",   # your gateway endpoint
    api_key="<your-virtual-account-token>",              # VAT for production; PAT in dev
)

resp = client.chat.completions.create(
    model="openai-main/gpt-5.5",
    messages=[{"role": "user", "content": "Summarize this document."}],
    extra_headers={
        # Structured identity for audit, attribution, and policy matching.
        "X-TFY-METADATA": '{"team":"support-ai","project":"helpdesk","cost_center":"cc-203","environment":"production"}',
    },
)
print(resp.choices[0].message.content)

1. Why Governance Becomes Non-Negotiable in Production

A single prototype calling one model on one key needs no governance. A dozen services across several teams, calling several providers, on data of varying sensitivity, needs a control plane — because the failure modes are no longer hypothetical. A shared key means no usage attribution, so you can't tell finance which team is driving spend or tell security which team touched customer data. No spend limits means one runaway agent (recall the routing post's silent escalation) can burn the budget before anyone notices. No audit trail means you can't reconstruct what happened for an incident or an auditor. And no access control means shadow AI: teams wiring up models without anyone tracking it.

On top of the operational pressure sits regulatory pressure. The EU AI Act is phasing in obligations around record-keeping, transparency, and human oversight (section 7), and sector regimes — SOC 2, HIPAA, financial rules — have long expected access control and audit. The common thread is that they all assume you can answer Mei's question. Governance is the work of being able to.

Fig 3: Teams hold virtual keys, not provider credentials. Every request crosses one control plane that resolves identity, applies RBAC and policy, enforces budget, runs guardrails, routes by residency, and writes an immutable audit record — exported to the SIEM — before reaching a provider.

2. Virtual Keys: Decoupling Access from Provider Credentials

The root cause of Mei's problem is the shared provider key. A virtual key fixes it: instead of handing teams the real provider credential, you issue each team or app its own gateway-managed key that maps to the underlying provider key at the gateway. The application authenticates with its virtual key; the gateway holds the real one.

That one indirection buys most of governance. Usage attributes to the virtual key, so spend and data access can be reported per team (the cost-attribution post builds on exactly this). Revocation is local — disable one team's virtual key without rotating the provider key for everyone else. And access is scopable — a virtual key can be limited to certain models, routes, or data classes. The provider credentials live in one place, the gateway, rather than scattered across a dozen service configs where they can't be tracked or rotated cleanly. In TrueFoundry's AI Gateway, virtual keys are the unit that ties usage, budgets, and access policy to a team or application rather than to an opaque shared credential.

3. RBAC and Policy-as-Code: Who Can Call What

Virtual keys establish identity; RBAC and policy decide what that identity may do. The questions are concrete: which teams may use the expensive frontier model, which may reach a given provider, which may call which tools (in an MCP setting), and which data classes may be sent where. Encoding these as policy-as-code — with an engine like Cedar or OPA — makes the rules explicit, reviewable, and versioned, rather than living as tribal knowledge or scattered conditionals.

Illustrative access policy (conceptual — exact schema is gateway-specific)# Only the research team may call the frontier model.allow if principal.team == "research" and resource.model == "gpt-5.5"# Customer-data requests must stay on an EU-resident model.deny if request.data_class == "customer_pii" and resource.region != "eu"# Finance team may not call external providers at all — self-hosted only.allow if principal.team == "finance" and resource.kind == "self_hosted"

The value of policy-as-code is that it turns "who can call what" into something you can review in a pull request, test, and prove to an auditor — the same governance discipline applied to MCP tool access in TrueFoundry's MCP security work, where Cedar and OPA gate which tools an agent may invoke. The gateway is the enforcement point because it's the one place every request crosses before reaching a provider or a tool.

4. Budgets, Quotas, and Rate Limits as Governance

Budgets and rate limits are usually framed as protection against abuse or runaway cost. In a governance context they're also fairness and accountability controls: each team gets a defined share, overruns are visible, and no single team can quietly consume the org's entire model budget. The mechanics are hard and soft limits per team or virtual key, alerting as a limit approaches, and enforcement — typically a 429 — when it's exceeded, the same enforcement path described in the cost-attribution post.

The governance framing changes how you set them. A soft limit with alerting is an accountability tool — it tells a budget owner their team is trending over, without breaking production. A hard limit is a guardrail against the runaway case, like the silently-escalating cascade from the routing post. Rate limits per key also double as a fairness mechanism in a shared-capacity setting, keeping one team's batch job from starving another team's interactive traffic. Setting these at TrueFoundry's AI Gateway means they apply per virtual key, consistently, rather than being reimplemented per service.

The schema is intentionally rule-based so a policy reads the way you'd describe it to an auditor: who (subjects: users, teams, virtual accounts, or any custom metadata key), what (models), how much (limit and unit — requests or tokens per minute, hour, or day), and scoped how (one shared limit, or a separate limit per user / per model / per metadata value via rate_limit_applies_per). Rules evaluate in order and the first match wins, so specific rules sit above broader fallbacks:

gateway-rate-limiting-config (real schema from TrueFoundry docs)

name: ratelimiting-config
type: gateway-rate-limiting-config
rules:
  # 1. Specific override: bob's runaway script gets a hard 1k requests/day on gpt4
  - id: "bob-gpt4-day-cap"
    when:
      subjects: ["user:bob@example.com"]
      models: ["openai-main/gpt4"]
    limit_to: 1000
    unit: requests_per_day

  # 2. Team-level token cap on a costly model
  - id: "backend-team-gpt4-tpm"
    when:
      subjects: ["team:backend"]
      models: ["openai-main/gpt4"]
    limit_to: 20000
    unit: tokens_per_minute

  # 3. Fairness floor: every user gets their own 1M-token/day budget on any model
  - id: "user-daily-token-cap"
    when: {}
    limit_to: 1000000
    unit: tokens_per_day
    rate_limit_applies_per: ["user"]

  # 4. Project-level cap based on metadata in the request header
  - id: "project-hourly-cap"
    when: {}
    limit_to: 50000
    unit: tokens_per_hour
    rate_limit_applies_per: ["metadata.project_id"]

TrueFoundry AI Gateway rate-limiting config UI

‍

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now

The fastest way to build, govern and scale your AI

How Can You Prevent GenAI Costs From Spiraling at Scale?

Gartner report on best practices for optimizing generative and agentic AI costs and projected statistics.

Access Full 2026 Report

Gartner Hype Cycle for Platform Engineering 2026

Access Full 2026 Report

One Layer of Control for All AI

Route and govern model and tool traffic with a centralized AI Gateway

Book Demo

Table of Contents

Text Link

One Gateway for Every LLM, Agent and MCP Server

Book a 30-min with our AI expert

Book a Demo

Summarize with

Blurry red snowflake on white background, symmetrical frosty design with soft edges and abstract shape.

AI Governance and Audit for Enterprise LLMs: Virtual Keys, RBAC, and Compliance-Grade Logs

Built for Speed: ~10ms Latency, Even Under Load

What TrueFoundry's AI Gateway Provides Here

1. Why Governance Becomes Non-Negotiable in Production

2. Virtual Keys: Decoupling Access from Provider Credentials

3. RBAC and Policy-as-Code: Who Can Call What

4. Budgets, Quotas, and Rate Limits as Governance

The fastest way to build, govern and scale your AI

One Layer of Control for All AI

One Gateway for Every LLM, Agent and MCP Server

The fastest way to build, govern and scale your AI

Introducing Ask TFY: A New Way to Understand and Control Your AI in Production

TrueFoundry vs MintMCP: MCP Gateway Comparison

TrueFoundry + Seldon: Unified Control Plane for Enterprise AI

Agent Economics, No. 2: Mapping Firm-Scale AI Controls to Agent-Economy Institutions

Recent Blogs

Agent Economics, No. 2: Mapping Firm-Scale AI Controls to Agent-Economy Institutions

Agent Economics, No. 1: What Is the Agent Economy — and Who Gets to Design It?

Introducing Ask TFY: A New Way to Understand and Control Your AI in Production

Best MCP Gateway for Production AI Systems in 2026

Best AI Gateways for LLM Inference Optimization in 2026

TrueFoundry vs MintMCP: MCP Gateway Comparison

Graph Engineering for Multi-Agent Systems: Architecture, Governance, and Observability

Designing for Model Deprecations with Virtual Models and Staged Cutovers

Unified AI Gateway as Enterprise's New Foundational Primitive

The Path to the Championship: Enterprise AI's Knockout Rounds Run Through the Gateway

AI Safety vs AI Security: What the Difference Means for Enterprise Teams

What Is Responsible AI? Principles, Practice, and What It Means for Enterprise Teams

AI Audit Checklist 2026: What to Review, When, and Why It Matters

BCG Says Strategy Matters More Than Tools — Part 2: From Agent Adoption to Governed Tools and Runtimes

BCG Says Strategy Matters More Than Tools — Part 1: From Strategic Clarity to Gateway Controls

Recursos

Por que TrueFoundry?

AI Governance and Audit for Enterprise LLMs: Virtual Keys, RBAC, and Compliance-Grade Logs

Built for Speed: ~10ms Latency, Even Under Load

What TrueFoundry's AI Gateway Provides Here

1. Why Governance Becomes Non-Negotiable in Production

2. Virtual Keys: Decoupling Access from Provider Credentials

3. RBAC and Policy-as-Code: Who Can Call What

4. Budgets, Quotas, and Rate Limits as Governance

The fastest way to build, govern and scale your AI

One Layer of Control for All AI

One Gateway for Every LLM, Agent and MCP Server

The fastest way to build, govern and scale your AI

Discover More

Introducing Ask TFY: A New Way to Understand and Control Your AI in Production

TrueFoundry vs MintMCP: MCP Gateway Comparison

TrueFoundry + Seldon: Unified Control Plane for Enterprise AI

Agent Economics, No. 2: Mapping Firm-Scale AI Controls to Agent-Economy Institutions

Recent Blogs

Agent Economics, No. 2: Mapping Firm-Scale AI Controls to Agent-Economy Institutions

Agent Economics, No. 1: What Is the Agent Economy — and Who Gets to Design It?

Introducing Ask TFY: A New Way to Understand and Control Your AI in Production

Best MCP Gateway for Production AI Systems in 2026

Best AI Gateways for LLM Inference Optimization in 2026

TrueFoundry vs MintMCP: MCP Gateway Comparison

Graph Engineering for Multi-Agent Systems: Architecture, Governance, and Observability

Designing for Model Deprecations with Virtual Models and Staged Cutovers

Unified AI Gateway as Enterprise's New Foundational Primitive

The Path to the Championship: Enterprise AI's Knockout Rounds Run Through the Gateway

AI Safety vs AI Security: What the Difference Means for Enterprise Teams

What Is Responsible AI? Principles, Practice, and What It Means for Enterprise Teams

AI Audit Checklist 2026: What to Review, When, and Why It Matters

BCG Says Strategy Matters More Than Tools — Part 2: From Agent Adoption to Governed Tools and Runtimes

BCG Says Strategy Matters More Than Tools — Part 1: From Strategic Clarity to Gateway Controls

Recursos

Por que TrueFoundry?

Assine nossa newsletter