Govern All AI Traffic Through the Gateway

AI usage inside a company comes from two different places: the AI your teams build (applications and agents using the OpenAI SDK, LangGraph, Google ADK, and similar) and the AI your employees use (third-party agents like Cursor, Claude Code, and ChatGPT on devices you don’t fully control). Both emit tokens, both can leak data, and both can call models with no audit trail. The goal of this guide is to give you a single place every model call passes through, make that path the easiest one to take, and wire it into the security tooling you already run so you can answer a basic question: which models did we call, who called them, what did they send, and what did it cost?

What governance means

The AI Gateway is the control point. When a model call passes through it, four capabilities apply:

FinOps — per-user, per-team, and per-model spend and budgets
Guardrails — PII, prompt-injection, moderation, and DLP on prompts and responses
Access control — who can call which models, plus rate limits
Audit — request logs and failover

Not every category of AI traffic can use all four. The table below shows what you get in each case.

Four categories of AI traffic

Category	FinOps	Guardrails	Access	Audit
1 · Apps you build	Yes	Yes	Yes	Yes
2a · Gateway-pluggable, MDM enforceable	Yes	Yes	Yes	Yes
2b · Gateway-pluggable, manual only	Yes*	Yes*	Yes*	Yes*
3 · No gateway endpoint	No	Yes	Partial	Yes
4 · SaaS-embedded AI	No	Input only	Limited	Limited
MCP (any category)	Yes	Yes	Yes	Yes

*Only for users who configure the gateway. You cannot enforce it company-wide. 1. Apps your teams build. These are applications, services, and agents you write yourself — using prompts, the OpenAI SDK, LangGraph, Google ADK, and similar frameworks. You control the code, so you route every model call through the gateway. Governance: all four — FinOps, guardrails, access control, and audit on every token. 2. Third-party agents that let you plug in the gateway. These are AI tools built by other companies — Cursor, Claude Code, Codex CLI, and similar — that expose a setting for the model endpoint. Point that at the gateway and the call goes through it. Governance: all four for calls that actually route through the gateway. This splits further by how you get traffic there:

2a · Enforceable via MDM — Claude Code, Codex CLI, Gemini CLI, and similar. Push the gateway URL fleet-wide over MDM and every device uses the gateway automatically. Governance: all four, enforced company-wide.
2b · Not enforceable via MDM (manual per user) — Claude Desktop / Cowork, and tools like Cursor where only some features honor a custom endpoint. Users who configure the gateway get all four capabilities; users who don’t are ungoverned. Governance: all four per configured user only.

3. Third-party agents that do not let you plug in the gateway. Some popular AI tools handle model calls on the vendor’s backend and give you no endpoint setting to change — the ChatGPT app, claude.ai, and similar apps employees run on their laptops. Governance: guardrails and audit on what you can see (prompts and responses in transit), plus partial access control (allow/block apps). No FinOps — the token-metered model call runs in the vendor’s cloud. 4. AI built into SaaS products. When AI is embedded inside another product — Salesforce Einstein, Microsoft Copilot inside M365, Notion AI — the model call never leaves that vendor’s cloud. Governance: limited — input-side DLP via CASB or SaaS admin controls at best. No FinOps, no full audit of model calls, no gateway access control.

MCP is a separate, simpler path. Nearly every AI tool that supports agents also lets users add MCP servers. The MCP Gateway fits there directly — point any tool’s MCP configuration at the gateway and you get all four capabilities over every MCP call, regardless of which category the tool’s model traffic falls into.

This page walks through how traffic from each category actually reaches the gateway, and what to do when it can’t.

How traffic reaches the gateway

Each category takes a different path — or never reaches the gateway at all. Category 1 — Apps your teams build. Your code calls the gateway API directly. Replace raw provider keys with a virtual account token so every request goes to the gateway URL instead of api.openai.com or api.anthropic.com. No interception, no device tooling — the application is configured to use the gateway from the start. Category 2 — Third-party agents with a gateway setting. The app makes its normal API call, but to the gateway URL instead of the provider.

2a (MDM enforceable): Push the gateway URL fleet-wide over MDM — e.g. set ANTHROPIC_BASE_URL for Claude Code. Every device calls the gateway automatically.
2b (manual only): Each user sets the gateway URL in the app’s settings themselves. Calls from configured users hit the gateway; everyone else goes direct to the provider.

Category 3 — No gateway endpoint. The app talks to the vendor’s backend, not a model API you can redirect. Traffic must be intercepted on the device and rerouted to the gateway using either the open-source on-machine agent (deployed via MDM) or your existing Secure Web Gateway (Zscaler, Netskope, Prisma Access). The gateway sees the request content for guardrails and audit, but not the underlying token-metered model call. Category 4 — SaaS-embedded AI. The model call runs entirely in the vendor’s cloud. It does not pass through your gateway. Your options are the SaaS vendor’s own admin controls or a CASB (Cloud Access Security Broker) — a security layer between your employees and cloud apps that watches data going in and out and applies policies. MCP traffic — any category. Any tool that supports MCP servers can add the MCP Gateway URL in its MCP configuration. MCP calls go through the gateway with full governance, independent of how the tool’s model calls are routed. The sections below cover each path in implementation detail — starting with direct integration for categories 1 and 2, then proxy-based capture for category 3.

Direct integration: apps you build and agents that accept a gateway URL

This covers category 1 (apps your teams build) and category 2 (third-party agents that let you plug in the gateway). Point the application straight at the gateway and stop handing out raw provider keys. A raw OpenAI or Anthropic key in an app’s environment, a developer’s .env, or a CI secret talks straight to the model with no policy in between. Instead, put the gateway in front of every provider and issue gateway credentials only:

For apps your teams build (OpenAI SDK, LangGraph, Google ADK, and similar), give each one a virtual account token instead of a provider key, so the only way to reach a model is through the gateway.
For third-party agents with a custom endpoint setting, set their model base URL to the gateway. Where MDM enforcement is supported (category 2a), push that setting fleet-wide so every device uses the gateway automatically.

Either way these are real model calls the gateway sees end to end, so you get all four governance capabilities — FinOps, guardrails, access control, and audit. Common third-party agents that support direct integration:

Claude Code

Claude Code Max

Claude Desktop

Codex CLI

Cursor

Cline

Gemini CLI

GitHub Copilot

OpenCode

See the full list of supported tools in the Ecosystem & Integrations catalog.

Proxy-based capture: category 3

This covers category 3 — third-party agents that do not let you plug in the gateway. These are mostly apps employees run on their devices: the ChatGPT app, claude.ai, and similar tools where the vendor handles model calls on their backend and gives you no endpoint to change. Their traffic has to be intercepted on the device and rerouted to the gateway, and TrueFoundry gives you two ways to do that:

An open-source on-machine agent you deploy over MDM.
Your existing Secure Web Gateway — Zscaler, Netskope, or Prisma Access — forwarding AI traffic to the gateway.

The rest of this page covers each path in detail, starting with the device. Category 4 (AI baked into SaaS) is covered separately at the end — device-side tools cannot reach those model calls.

Getting device traffic to the gateway

Category 1 apps call the gateway directly from your infrastructure — no device tooling needed. On employee laptops, categories 2 and 3 use one of three mechanisms:

Method A: Point the app at the gateway

This is the cleanest case for category 2 — third-party agents that expose a model-endpoint setting. Push a managed configuration over MDM that points them at the gateway. There’s no interception and no certificates: the app makes its normal call to a URL you chose. Because these are real model calls, you get all four governance capabilities. How far this reaches depends on the product. The key split is whether you can enforce the setting fleet-wide:

Tool	Route to gateway	MDM enforceable	Notes
Claude Code / Claude Code Max	Yes	Yes	Set `ANTHROPIC_BASE_URL` and lock it with MDM
Codex CLI	Yes	Yes	Custom OpenAI base URL, behaves like Claude Code
Gemini CLI	Yes	Yes	Custom endpoint, enforceable via MDM
Claude Desktop / Cowork	Yes	No	Governs models and MCP servers, but the setting is manual per user and can’t be enforced via MDM today
Cursor	Partial	No	Several agentic features still route through Cursor’s own backend
Claude on the web (claude.ai)	No	—	No endpoint setting; falls into category 3. Govern with SSO, domain capture, and Admin Console policy, or intercept via agent/SWG. See enterprise security for Claude

Any tool with a “set base URL” or custom-endpoint option is a Method A candidate. For tools where MDM enforcement isn’t available, governance depends on each user configuring the gateway themselves. See the full list in the Ecosystem & Integrations catalog.

Method B: The open-source on-machine agent

For category 3 — third-party agents that won’t point at a gateway — TrueFoundry is open-sourcing an agent you deploy over MDM. These are mostly apps employees use on their devices where the vendor handles model calls on their backend: the ChatGPT app, claude.ai, and similar. The agent does application control (allow the AI apps you sanction, block the ones you don’t) and, for allowed apps, intercepts the LLM and MCP requests (not telemetry) and reroutes them through the gateway. It inspects TLS only for an allowlist of AI hosts; everything else passes through untouched. For a request worth governing, the agent keeps the original request intact, including the app’s own provider credentials, and changes only where the bytes go. It adds two headers:

x-tfy-api-key:       <the user's TrueFoundry token>
x-tfy-original-url:  https://api.anthropic.com/v1/messages

The gateway then authenticates the user, logs the call, strips the x-tfy-* headers, and forwards to the original URL. The response streams straight back, and the app notices nothing. This buys guardrails, DLP, audit, and allow/block for apps that would otherwise be invisible. The honest limit is structural: these apps don’t call a model directly. The ChatGPT app, claude.ai, and similar consumer surfaces call their own backend, which calls the model server-side. The agent sees what the app sends that backend and what comes back (enough for content guardrails and audit), but the token-metered model call happens in the vendor’s cloud.

For backend-proxied apps you get content governance, but not FinOps — you can’t meter tokens you never see.

Method C: Reuse the SWG you already deployed

If your fleet already runs a Secure Web Gateway, the interception layer is already on every device and already decrypting traffic. You point its forwarding at the AI Gateway and install nothing new. This is another way to handle category 3 traffic. The per-vendor mechanics are covered in the next section.

Integrating with what’s already on the device

Enterprises already run two kinds of agent on every device, and they do different jobs. Getting this distinction right is the difference between a working rollout and a stalled one:

Traffic-redirection tools — Secure Web Gateways like Zscaler, Netskope, and Prisma Access sit in the network path and can forward AI traffic to your gateway.
Endpoint-control tools — EDR and MDM like CrowdStrike, Jamf, and Intune sit on the device and decide which apps run and what’s deployed. They are not web proxies and cannot redirect a model call.

You use the first kind to route, and the second kind to deploy and to block.

Use a Secure Web Gateway to route a model call to the gateway. Use CrowdStrike and MDM to deploy the routing and to block what you don’t sanction. Asking CrowdStrike to redirect traffic, or asking Zscaler to manage endpoint posture, is using the wrong tool for the job.

Traffic redirection: the Secure Web Gateways

All three majors implement the same pattern: forward matched AI domains to a third-party upstream proxy (your gateway) and inject the user’s identity in the X-Authenticated-User header. The gateway runs an SWG-mode listener that reads that header, performs its own inspection, governs, and forwards. One contract covers all of them. Two requirements are common to every vendor:

SSL trust. Because the SWG hands the gateway a re-encrypted tunnel, the gateway does its own SSL inspection on the chained traffic, and the SWG must be configured to trust the gateway’s CA. Each product has a field for exactly this.
Identity must be trusted. X-Authenticated-User is an unauthenticated assertion of who the user is, so the gateway must accept it only from your SWG — locked down with mTLS or a strict source-IP allowlist plus a shared secret. Otherwise anyone could spoof the header and impersonate any user.

Zscaler (ZIA)
Netskope
Prisma Access
Cisco Umbrella

Zscaler uses Third-Party Proxy Chaining, acting as the child proxy that forwards to your gateway. Up to eight proxy objects are supported.

Enable SSL Inspection

Enable SSL Inspection for the AI-provider domains so Zscaler can decrypt and forward the traffic.

Add the gateway as a proxy

Go to Forwarding Control → Proxies & Gateways → Proxies → Add Proxy. Set the name, the gateway’s IP/FQDN and port, optionally the gateway’s root certificate, and enable Insert X-Authenticated-User (base64 optional).

Create a Gateway object

Create a Gateway object that references the proxy you added (primary/secondary).

Add a Forwarding Control rule

Add a Forwarding Control rule matching the AI domains / URL categories, with the forward method set to that gateway.

Endpoint control and deployment: EDR and MDM

EDR and MDM can’t reroute a model call, but they’re already on every endpoint, which makes them the right tools for deploying the agent and for blocking what you won’t allow. No model traffic flows through them. MDM (Jamf, Intune, Kandji, Workspace ONE) is how you roll out the device-side pieces:

Install the TrueFoundry agent fleet-wide when you use Method B for category 3 traffic, and enforce that it keeps running.
Distribute and trust the agent’s CA (or the SWG-chain CA) so TLS inspection works.
Apply Method A configuration patches that point category 2 agents at the gateway — especially category 2a tools where MDM enforcement is supported.

If you only use an existing SWG (Method C), the interception layer is already deployed, so MDM isn’t required for routing. CrowdStrike Falcon (endpoint control) blocks and allows apps at the device. It does not route or inspect model calls:

Falcon Firewall Management provides centralized, application- and location-aware host firewall policy across Windows, macOS, and Linux. Use it to block unsanctioned AI apps and destinations outright and to allowlist the ones you’ve routed through the gateway.
Falcon Secure Access applies Zero Trust controls inside the browser session across any browser. Use it for category 3 web surfaces (like claude.ai) and category 4 SaaS AI that can’t be proxied as model traffic.

Don’t confuse Falcon’s endpoint controls with CrowdStrike Falcon AIDR, which is a guardrail the gateway calls inline once traffic arrives — not a device tool. It’s covered under Guardrails, alongside the other guardrail providers the gateway integrates with.

Device tooling summary

These are the tools that get device traffic to the gateway or control what runs on the device. Guardrails that the gateway applies once traffic arrives are a separate concern, covered in the next section.

Provider	Role	Mechanism	Identity	What you get
Zscaler ZIA	Redirect	Third-Party Proxy Chaining	`X-Authenticated-User`	Route AI traffic to gateway
Netskope	Redirect	Forward to Proxy (RTP)	XAU / XFF	Route AI traffic to gateway
Prisma Access	Redirect	Downstream Proxy Chaining	XAU / XFF	Route AI traffic to gateway
Cisco Umbrella	Redirect (partial)	Proxy chaining (upstream-oriented)	Varies	Validate; else use agent/PAC
MDM (Jamf / Intune)	Deploy	Install TrueFoundry agent · config patch · CA	—	Roll out agent, patch apps, trust CA
CrowdStrike Falcon	Control	Firewall Mgmt · Secure Access	Device / identity	Block/allow apps; in-browser AI controls

Guardrails: securing traffic on the gateway

Routing gets traffic to the gateway. What protects it there is guardrails — and these run inside the gateway, independent of how the traffic arrived. Once a call lands, the gateway can inspect prompts before they reach a model and block, redact, or rewrite risky responses in real time, for threats like prompt injection, jailbreaks, PII leakage, and data exposure. The gateway applies guardrails by calling guardrail providers inline. You can use TrueFoundry’s built-in guardrails or integrate external providers — including CrowdStrike Falcon AIDR, Palo Alto AIRS, Azure Content Safety, AWS Bedrock Guardrails, and many others. For the CrowdStrike AIDR integration specifically, see the CrowdStrike AIDR guardrail. Because every governed call passes through the gateway, it’s also the natural export point for AI telemetry: stream the gateway’s request logs (who called which model, with what prompt and response) into your SIEM — such as CrowdStrike Falcon Next-Gen SIEM — so AI activity sits alongside endpoint, identity, and cloud signals.

The governability spectrum

The governance matrix in Four categories of AI traffic is the quick reference. This table adds the mechanism for each path:

Category	Example	How traffic reaches the gateway
1 · Apps you build	OpenAI SDK, LangGraph, Google ADK	Virtual account token in your code
2a · MDM enforceable	Claude Code, Codex CLI	Gateway URL pushed fleet-wide via MDM
2b · Manual only	Claude Desktop, partial Cursor	Per-user gateway URL in app settings
3 · No gateway endpoint	ChatGPT app, claude.ai	On-device agent or SWG intercepts and reroutes
4 · SaaS-embedded AI	Salesforce Einstein	Does not reach gateway — SaaS controls or CASB
MCP	Any MCP-capable tool	MCP Gateway URL in tool’s MCP config

Categories 1, 2, and MCP are the wins: genuine calls pass through the gateway with all four capabilities. Category 3 is the honest middle — guardrails and audit on visible content, but no FinOps. Category 4 is the wall — device-side tools cannot touch those model calls.

The honest gaps

No on-device approach is complete. Plan around these limits:

Backend-proxied FinOps (category 3). If the app calls its own backend, the token count lives in the vendor’s cloud. Content governance survives; cost accounting does not.
Certificate-pinned apps. Some clients pin their TLS certificate and refuse inspection by your agent or the SWG. A pinned app with no endpoint setting can’t be governed transparently; the realistic control is allow-or-block via CrowdStrike’s firewall and browser controls.
stdio MCP. Many MCP servers run as local subprocesses over stdio, which is not network traffic. Interceptors see remote (HTTP/SSE) MCP servers but are blind to stdio ones. For remote MCP, point tools at the MCP Gateway instead — that path is always governable.
QUIC and real-time transports. Browsers may use HTTP/3 over QUIC (UDP), which a TCP proxy won’t see — block QUIC so clients fall back to TCP. Voice/realtime features that use WebRTC ride DTLS over UDP, a different problem entirely; treat them as out of scope.

Putting it together

Governing every token isn’t one mechanism — it’s a chokepoint plus a set of integrations matched to how much each app and tool will cooperate:

Stand up the gateway and route everything you build (category 1) through it with virtual account tokens.
Patch every third-party agent that accepts a gateway URL to point at it — enforce fleet-wide over MDM where supported (category 2a), or document the per-user setup where it isn’t (category 2b).
For agents that don’t accept a gateway URL (category 3), redirect their traffic with the SWG you already run — Zscaler, Netskope, or Prisma — or drop the open-source agent where there’s no SWG (Methods B and C).
For AI embedded in SaaS (category 4), use the vendor’s native controls or a CASB; device-side tools won’t reach those model calls.
Point MCP server configurations at the MCP Gateway in any tool that supports MCP — this path works regardless of category.
Use MDM to install the TrueFoundry agent and push config patches, and CrowdStrike to block what you won’t sanction.
Apply guardrails on the gateway — TrueFoundry’s built-in ones or external providers like CrowdStrike AIDR — and export the gateway’s logs to your SIEM so AI activity sits alongside the rest of your security signal.

That yields complete governance over every model call you can route, content-level governance over the apps you can only observe, endpoint-level control over the apps you won’t allow, and inline guardrails on everything that reaches the gateway — with a clear, defensible line where the limits begin.

Set up the control plane first with Setup AI Gateway in Your Organization, then return here to route device and SaaS traffic into it.

​What governance means

​Four categories of AI traffic

​How traffic reaches the gateway

​Direct integration: apps you build and agents that accept a gateway URL

​Proxy-based capture: category 3

​Getting device traffic to the gateway

​Method A: Point the app at the gateway

​Method B: The open-source on-machine agent

​Method C: Reuse the SWG you already deployed

​Integrating with what’s already on the device

​Traffic redirection: the Secure Web Gateways

​Endpoint control and deployment: EDR and MDM

​Device tooling summary

​Guardrails: securing traffic on the gateway

​The governability spectrum

​The honest gaps

​Putting it together

What governance means

Four categories of AI traffic

How traffic reaches the gateway

Direct integration: apps you build and agents that accept a gateway URL

Proxy-based capture: category 3

Getting device traffic to the gateway

Method A: Point the app at the gateway

Method B: The open-source on-machine agent

Method C: Reuse the SWG you already deployed

Integrating with what’s already on the device

Traffic redirection: the Secure Web Gateways

Endpoint control and deployment: EDR and MDM

Device tooling summary

Guardrails: securing traffic on the gateway

The governability spectrum

The honest gaps

Putting it together