What governance means
The AI Gateway is the control point. When a model call passes through it, four capabilities apply:- FinOps — per-user, per-team, and per-model spend and budgets
- Guardrails — PII, prompt-injection, moderation, and DLP on prompts and responses
- Access control — who can call which models, plus rate limits
- Audit — request logs and failover
Four categories of AI traffic
| Category | FinOps | Guardrails | Access | Audit |
|---|---|---|---|---|
| 1 · Apps you build | Yes | Yes | Yes | Yes |
| 2a · Gateway-pluggable, MDM enforceable | Yes | Yes | Yes | Yes |
| 2b · Gateway-pluggable, manual only | Yes* | Yes* | Yes* | Yes* |
| 3 · No gateway endpoint | No | Yes | Partial | Yes |
| 4 · SaaS-embedded AI | No | Input only | Limited | Limited |
| MCP (any category) | Yes | Yes | Yes | Yes |
- 2a · Enforceable via MDM — Claude Code, Codex CLI, Gemini CLI, and similar. Push the gateway URL fleet-wide over MDM and every device uses the gateway automatically. Governance: all four, enforced company-wide.
- 2b · Not enforceable via MDM (manual per user) — Claude Desktop / Cowork, and tools like Cursor where only some features honor a custom endpoint. Users who configure the gateway get all four capabilities; users who don’t are ungoverned. Governance: all four per configured user only.
MCP is a separate, simpler path. Nearly every AI tool that supports agents also lets users add MCP servers. The MCP Gateway fits there directly — point any tool’s MCP configuration at the gateway and you get all four capabilities over every MCP call, regardless of which category the tool’s model traffic falls into.
How traffic reaches the gateway
Each category takes a different path — or never reaches the gateway at all. Category 1 — Apps your teams build. Your code calls the gateway API directly. Replace raw provider keys with a virtual account token so every request goes to the gateway URL instead ofapi.openai.com or api.anthropic.com. No interception, no device tooling — the application is configured to use the gateway from the start.
Category 2 — Third-party agents with a gateway setting. The app makes its normal API call, but to the gateway URL instead of the provider.
- 2a (MDM enforceable): Push the gateway URL fleet-wide over MDM — e.g. set
ANTHROPIC_BASE_URLfor Claude Code. Every device calls the gateway automatically. - 2b (manual only): Each user sets the gateway URL in the app’s settings themselves. Calls from configured users hit the gateway; everyone else goes direct to the provider.
Direct integration: apps you build and agents that accept a gateway URL
This covers category 1 (apps your teams build) and category 2 (third-party agents that let you plug in the gateway). Point the application straight at the gateway and stop handing out raw provider keys. A raw OpenAI or Anthropic key in an app’s environment, a developer’s.env, or a CI secret talks straight to the model with no policy in between.
Instead, put the gateway in front of every provider and issue gateway credentials only:
- For apps your teams build (OpenAI SDK, LangGraph, Google ADK, and similar), give each one a virtual account token instead of a provider key, so the only way to reach a model is through the gateway.
- For third-party agents with a custom endpoint setting, set their model base URL to the gateway. Where MDM enforcement is supported (category 2a), push that setting fleet-wide so every device uses the gateway automatically.








Proxy-based capture: category 3
This covers category 3 — third-party agents that do not let you plug in the gateway. These are mostly apps employees run on their devices: the ChatGPT app, claude.ai, and similar tools where the vendor handles model calls on their backend and gives you no endpoint to change. Their traffic has to be intercepted on the device and rerouted to the gateway, and TrueFoundry gives you two ways to do that:- An open-source on-machine agent you deploy over MDM.
- Your existing Secure Web Gateway — Zscaler, Netskope, or Prisma Access — forwarding AI traffic to the gateway.
Getting device traffic to the gateway
Category 1 apps call the gateway directly from your infrastructure — no device tooling needed. On employee laptops, categories 2 and 3 use one of three mechanisms:Method A: Point the app at the gateway
This is the cleanest case for category 2 — third-party agents that expose a model-endpoint setting. Push a managed configuration over MDM that points them at the gateway. There’s no interception and no certificates: the app makes its normal call to a URL you chose. Because these are real model calls, you get all four governance capabilities. How far this reaches depends on the product. The key split is whether you can enforce the setting fleet-wide:| Tool | Route to gateway | MDM enforceable | Notes |
|---|---|---|---|
| Claude Code / Claude Code Max | Yes | Yes | Set ANTHROPIC_BASE_URL and lock it with MDM |
| Codex CLI | Yes | Yes | Custom OpenAI base URL, behaves like Claude Code |
| Gemini CLI | Yes | Yes | Custom endpoint, enforceable via MDM |
| Claude Desktop / Cowork | Yes | No | Governs models and MCP servers, but the setting is manual per user and can’t be enforced via MDM today |
| Cursor | Partial | No | Several agentic features still route through Cursor’s own backend |
| Claude on the web (claude.ai) | No | — | No endpoint setting; falls into category 3. Govern with SSO, domain capture, and Admin Console policy, or intercept via agent/SWG. See enterprise security for Claude |
Method B: The open-source on-machine agent
For category 3 — third-party agents that won’t point at a gateway — TrueFoundry is open-sourcing an agent you deploy over MDM. These are mostly apps employees use on their devices where the vendor handles model calls on their backend: the ChatGPT app, claude.ai, and similar. The agent does application control (allow the AI apps you sanction, block the ones you don’t) and, for allowed apps, intercepts the LLM and MCP requests (not telemetry) and reroutes them through the gateway. It inspects TLS only for an allowlist of AI hosts; everything else passes through untouched. For a request worth governing, the agent keeps the original request intact, including the app’s own provider credentials, and changes only where the bytes go. It adds two headers:x-tfy-* headers, and forwards to the original URL. The response streams straight back, and the app notices nothing.
This buys guardrails, DLP, audit, and allow/block for apps that would otherwise be invisible. The honest limit is structural: these apps don’t call a model directly. The ChatGPT app, claude.ai, and similar consumer surfaces call their own backend, which calls the model server-side. The agent sees what the app sends that backend and what comes back (enough for content guardrails and audit), but the token-metered model call happens in the vendor’s cloud.
Method C: Reuse the SWG you already deployed
If your fleet already runs a Secure Web Gateway, the interception layer is already on every device and already decrypting traffic. You point its forwarding at the AI Gateway and install nothing new. This is another way to handle category 3 traffic. The per-vendor mechanics are covered in the next section.Integrating with what’s already on the device
Enterprises already run two kinds of agent on every device, and they do different jobs. Getting this distinction right is the difference between a working rollout and a stalled one:- Traffic-redirection tools — Secure Web Gateways like Zscaler, Netskope, and Prisma Access sit in the network path and can forward AI traffic to your gateway.
- Endpoint-control tools — EDR and MDM like CrowdStrike, Jamf, and Intune sit on the device and decide which apps run and what’s deployed. They are not web proxies and cannot redirect a model call.
Use a Secure Web Gateway to route a model call to the gateway. Use CrowdStrike and MDM to deploy the routing and to block what you don’t sanction. Asking CrowdStrike to redirect traffic, or asking Zscaler to manage endpoint posture, is using the wrong tool for the job.
Traffic redirection: the Secure Web Gateways
All three majors implement the same pattern: forward matched AI domains to a third-party upstream proxy (your gateway) and inject the user’s identity in theX-Authenticated-User header. The gateway runs an SWG-mode listener that reads that header, performs its own inspection, governs, and forwards. One contract covers all of them.
Two requirements are common to every vendor:
- SSL trust. Because the SWG hands the gateway a re-encrypted tunnel, the gateway does its own SSL inspection on the chained traffic, and the SWG must be configured to trust the gateway’s CA. Each product has a field for exactly this.
- Identity must be trusted.
X-Authenticated-Useris an unauthenticated assertion of who the user is, so the gateway must accept it only from your SWG — locked down with mTLS or a strict source-IP allowlist plus a shared secret. Otherwise anyone could spoof the header and impersonate any user.
- Zscaler (ZIA)
- Netskope
- Prisma Access
- Cisco Umbrella
Zscaler uses Third-Party Proxy Chaining, acting as the child proxy that forwards to your gateway. Up to eight proxy objects are supported.
Enable SSL Inspection
Enable SSL Inspection for the AI-provider domains so Zscaler can decrypt and forward the traffic.
Add the gateway as a proxy
Go to Forwarding Control → Proxies & Gateways → Proxies → Add Proxy. Set the name, the gateway’s IP/FQDN and port, optionally the gateway’s root certificate, and enable Insert X-Authenticated-User (base64 optional).
Create a Gateway object
Create a Gateway object that references the proxy you added (primary/secondary).
Endpoint control and deployment: EDR and MDM
EDR and MDM can’t reroute a model call, but they’re already on every endpoint, which makes them the right tools for deploying the agent and for blocking what you won’t allow. No model traffic flows through them. MDM (Jamf, Intune, Kandji, Workspace ONE) is how you roll out the device-side pieces:- Install the TrueFoundry agent fleet-wide when you use Method B for category 3 traffic, and enforce that it keeps running.
- Distribute and trust the agent’s CA (or the SWG-chain CA) so TLS inspection works.
- Apply Method A configuration patches that point category 2 agents at the gateway — especially category 2a tools where MDM enforcement is supported.
- Falcon Firewall Management provides centralized, application- and location-aware host firewall policy across Windows, macOS, and Linux. Use it to block unsanctioned AI apps and destinations outright and to allowlist the ones you’ve routed through the gateway.
- Falcon Secure Access applies Zero Trust controls inside the browser session across any browser. Use it for category 3 web surfaces (like claude.ai) and category 4 SaaS AI that can’t be proxied as model traffic.
Don’t confuse Falcon’s endpoint controls with CrowdStrike Falcon AIDR, which is a guardrail the gateway calls inline once traffic arrives — not a device tool. It’s covered under Guardrails, alongside the other guardrail providers the gateway integrates with.
Device tooling summary
These are the tools that get device traffic to the gateway or control what runs on the device. Guardrails that the gateway applies once traffic arrives are a separate concern, covered in the next section.| Provider | Role | Mechanism | Identity | What you get |
|---|---|---|---|---|
| Zscaler ZIA | Redirect | Third-Party Proxy Chaining | X-Authenticated-User | Route AI traffic to gateway |
| Netskope | Redirect | Forward to Proxy (RTP) | XAU / XFF | Route AI traffic to gateway |
| Prisma Access | Redirect | Downstream Proxy Chaining | XAU / XFF | Route AI traffic to gateway |
| Cisco Umbrella | Redirect (partial) | Proxy chaining (upstream-oriented) | Varies | Validate; else use agent/PAC |
| MDM (Jamf / Intune) | Deploy | Install TrueFoundry agent · config patch · CA | — | Roll out agent, patch apps, trust CA |
| CrowdStrike Falcon | Control | Firewall Mgmt · Secure Access | Device / identity | Block/allow apps; in-browser AI controls |
Guardrails: securing traffic on the gateway
Routing gets traffic to the gateway. What protects it there is guardrails — and these run inside the gateway, independent of how the traffic arrived. Once a call lands, the gateway can inspect prompts before they reach a model and block, redact, or rewrite risky responses in real time, for threats like prompt injection, jailbreaks, PII leakage, and data exposure. The gateway applies guardrails by calling guardrail providers inline. You can use TrueFoundry’s built-in guardrails or integrate external providers — including CrowdStrike Falcon AIDR, Palo Alto AIRS, Azure Content Safety, AWS Bedrock Guardrails, and many others. For the CrowdStrike AIDR integration specifically, see the CrowdStrike AIDR guardrail. Because every governed call passes through the gateway, it’s also the natural export point for AI telemetry: stream the gateway’s request logs (who called which model, with what prompt and response) into your SIEM — such as CrowdStrike Falcon Next-Gen SIEM — so AI activity sits alongside endpoint, identity, and cloud signals.The governability spectrum
The governance matrix in Four categories of AI traffic is the quick reference. This table adds the mechanism for each path:| Category | Example | How traffic reaches the gateway |
|---|---|---|
| 1 · Apps you build | OpenAI SDK, LangGraph, Google ADK | Virtual account token in your code |
| 2a · MDM enforceable | Claude Code, Codex CLI | Gateway URL pushed fleet-wide via MDM |
| 2b · Manual only | Claude Desktop, partial Cursor | Per-user gateway URL in app settings |
| 3 · No gateway endpoint | ChatGPT app, claude.ai | On-device agent or SWG intercepts and reroutes |
| 4 · SaaS-embedded AI | Salesforce Einstein | Does not reach gateway — SaaS controls or CASB |
| MCP | Any MCP-capable tool | MCP Gateway URL in tool’s MCP config |
The honest gaps
No on-device approach is complete. Plan around these limits:- Backend-proxied FinOps (category 3). If the app calls its own backend, the token count lives in the vendor’s cloud. Content governance survives; cost accounting does not.
- Certificate-pinned apps. Some clients pin their TLS certificate and refuse inspection by your agent or the SWG. A pinned app with no endpoint setting can’t be governed transparently; the realistic control is allow-or-block via CrowdStrike’s firewall and browser controls.
- stdio MCP. Many MCP servers run as local subprocesses over stdio, which is not network traffic. Interceptors see remote (HTTP/SSE) MCP servers but are blind to stdio ones. For remote MCP, point tools at the MCP Gateway instead — that path is always governable.
- QUIC and real-time transports. Browsers may use HTTP/3 over QUIC (UDP), which a TCP proxy won’t see — block QUIC so clients fall back to TCP. Voice/realtime features that use WebRTC ride DTLS over UDP, a different problem entirely; treat them as out of scope.
Putting it together
Governing every token isn’t one mechanism — it’s a chokepoint plus a set of integrations matched to how much each app and tool will cooperate:- Stand up the gateway and route everything you build (category 1) through it with virtual account tokens.
- Patch every third-party agent that accepts a gateway URL to point at it — enforce fleet-wide over MDM where supported (category 2a), or document the per-user setup where it isn’t (category 2b).
- For agents that don’t accept a gateway URL (category 3), redirect their traffic with the SWG you already run — Zscaler, Netskope, or Prisma — or drop the open-source agent where there’s no SWG (Methods B and C).
- For AI embedded in SaaS (category 4), use the vendor’s native controls or a CASB; device-side tools won’t reach those model calls.
- Point MCP server configurations at the MCP Gateway in any tool that supports MCP — this path works regardless of category.
- Use MDM to install the TrueFoundry agent and push config patches, and CrowdStrike to block what you won’t sanction.
- Apply guardrails on the gateway — TrueFoundry’s built-in ones or external providers like CrowdStrike AIDR — and export the gateway’s logs to your SIEM so AI activity sits alongside the rest of your security signal.
Set up the control plane first with Setup AI Gateway in Your Organization, then return here to route device and SaaS traffic into it.