Claude Code Proxy: Route Claude, GPT-5 & Gemini Through TrueFoundry AI Gateway

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Introduction

Claude Code is the most powerful AI coding assistant available today. Engineers who adopt it rarely go back. But when dozens or hundreds of engineers start using it at the same time, a new problem appears: Claude Code, by default, talks directly to Anthropic's API. Every developer authenticates with their own key, uses Anthropic models exclusively, and generates API spend that is completely invisible to the platform team until the monthly invoice arrives.

A Claude Code proxy is the answer. By pointing Claude Code at a proxy endpoint instead of directly at Anthropic, you gain a centralized control point for every model call across your entire engineering organization: visibility into who is spending what, the ability to enforce budget caps before they're exceeded, access to models from any provider - GPT-5, Gemini 2.5 Pro, Llama via Bedrock - through the same interface Claude Code already knows, and the ability to deploy gateway configuration once and have it apply to all developers without touching individual machines.

TrueFoundry AI Gateway is the enterprise-grade Claude Code proxy. It is a drop-in Anthropic-compatible endpoint that Claude Code connects to with a single environment variable change. Once connected, every Claude Code request flows through the gateway giving you observability, cost controls, multi-model routing, and enterprise security policies that apply to the whole organization, not just the developers who remember to configure them.

This guide explains exactly what a Claude Code proxy does, why TrueFoundry AI Gateway is the right one for enterprise engineering teams, and how to configure it, including the complete settings.json for both standard API key and Claude Max subscription flows.

What Is a Claude Code Proxy?

Claude Code ships with a single configuration knob for changing its backend: the ANTHROPIC_BASE_URL environment variable. When set, Claude Code sends all its API requests - messages, model calls, streaming responses to that URL instead of to https://api.anthropic.com.

That one variable is the foundation of every Claude Code proxy. A proxy is any server that:

Accepts Anthropic-format API requests from Claude Code
Adds controls, routing, or observability at the proxy layer
Forwards requests to the actual model provider (Anthropic, OpenAI, Google, Bedrock, on-prem)
Returns responses back to Claude Code in the format it expects

The simplest possible proxy is a reverse proxy with logging. The most sophisticated is an enterprise AI gateway that handles authentication, budget enforcement, model routing across providers, semantic caching, guardrails, and full audit trails - all transparently, with no changes to how Claude Code behaves for the developer.

Why do teams build or adopt a Claude Code proxy?

Cost control: Multiple developers using Claude Code with individual Anthropic keys generate spend that is invisible until month-end. A proxy intercepts every request and enforces per-developer daily limits before costs exceed budget.
Multi-model access: Claude Code's interface is powerful, but Claude models are not always the best or most cost-effective choice for every task. A proxy lets you route haiku-tier tasks to GPT-4o-mini or Gemini Flash, and opus-tier tasks to the best available model without any client-side changes.
Enterprise security: Direct API keys on developer laptops are a security liability. A proxy centralizes credentials: developers authenticate to the proxy, and the proxy holds provider keys. No Anthropic key ever needs to live on a developer machine.
Team-wide governance: Individual developers can configure their own ANTHROPIC_BASE_URL. But enforcing it across an entire team requires a centralized deployment mechanism - MDM, server-managed settings, or a shared project .claude/settings.json checked into version control.

Why TrueFoundry AI Gateway Is the Right Claude Code Proxy

There are three ways to proxy Claude Code: build your own, use a simple reverse proxy, or use a purpose-built AI gateway. Building your own means owning the maintenance, security, and reliability of a production API gateway. A simple reverse proxy adds logging but none of the controls. TrueFoundry AI Gateway gives you everything an enterprise engineering team actually needs without building or maintaining it.

TrueFoundry AI Gateway is a unified proxy layer between Claude Code and your model providers. It accepts the same Anthropic API format that Claude Code already speaks, so Claude Code never needs to know it's talking to a gateway rather than directly to Anthropic. Behind the gateway, you can connect any provider: Anthropic direct, AWS Bedrock, Google Vertex AI, Azure OpenAI, OpenAI, or your own on-prem models.

Here is what Claude Code actually sees:

Claude Code  →  ANTHROPIC_BASE_URL (TrueFoundry Gateway)  →  Anthropic / OpenAI / Gemini / Bedrock / On-prem

Every Claude Code request that flows through TrueFoundry gains, automatically:

Capability	What It Does for Claude Code Users	TrueFoundry Feature
Multi-provider model access	Use GPT-5, Gemini 2.5 Pro, Llama, or on-prem models through the same Claude Code interface	Virtual Models
Per-developer budget limits	Blocks requests when daily or monthly spend caps are hit — before cost overruns, not after	Budget Limiting
Rate limiting	Throttle per-developer, per-team, or per-environment request rates	Rate Limiting
Cost attribution	Dashboard showing exactly which developer, team, and model drove every dollar of spend	Analytics
RBAC and virtual keys	No Anthropic API keys on developer machines — team members authenticate with TrueFoundry keys scoped to their access level	Access Control
Automatic failover	If Anthropic hits a rate limit or outage, the gateway silently retries on the next configured provider	Load Balancing & Fallbacks
Guardrails	PII detection, prompt injection protection, and custom content policies applied before requests reach the model	Guardrails
Full audit trail	Every request logged with user, model, token count, cost, and latency — exportable via OpenTelemetry	OpenTelemetry Export

~3–4ms p95 gateway overhead, 350+ RPS on a single vCPU. At Claude Code response times (seconds, not milliseconds), the gateway adds no perceptible latency.

Step 1: Point Claude Code at TrueFoundry AI Gateway

The core configuration is a single environment variable:

export ANTHROPIC_BASE_URL="https://<your-truefoundry-gateway-url>"

For persistent configuration - which is what you want for production use - edit Claude Code's settings.json. Two paths are supported:

Global (applies to all projects): ~/.claude/settings.json
Project-specific (checked into version control): .claude/settings.json in your project directory

Standard API Key Configuration

Use this when developers authenticate with a TrueFoundry API key (the recommended enterprise pattern — no Anthropic keys on developer machines):‍

{
  "env": {
    "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
    "ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key",
    "ANTHROPIC_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022",
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1",
    "ANTHROPIC_CUSTOM_HEADERS": "x-tfy-anthropic-beta: context-management-2025-06-27"
  }
}

What each field does:

ANTHROPIC_BASE_URL — redirects all Claude Code requests to TrueFoundry
ANTHROPIC_AUTH_TOKEN — TrueFoundry API key; authenticates the developer to the gateway (replaces Anthropic API key)
ANTHROPIC_MODEL — the default model for Claude Code sessions
ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL — map Claude Code's built-in model aliases (/model opus, /model sonnet, /model haiku) to your TrueFoundry-configured models
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS — disables experimental Claude Code features for stable gateway behavior
ANTHROPIC_CUSTOM_HEADERS — forwards the x-tfy-anthropic-beta header to Anthropic for beta features like context management

Important: Claude Code detects model capabilities (extended thinking, ToolSearch, beta tool blocks) by string-matching the model ID. Make sure ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, and ANTHROPIC_DEFAULT_HAIKU_MODEL contain a recognizable Anthropic model ID like claude-opus-4-7, claude-sonnet-4-6, or claude-haiku-4-5. If you're using a TrueFoundry Virtual Model, ensure its display name contains the underlying model ID — e.g. your-account/claude-haiku-4-5 — so string-matching succeeds.

Claude Code Max Subscription Configuration

If your team uses Claude Code Max subscriptions, Claude Code reserves the Authorization header for Anthropic account authentication. Use x-tfy-api-key in ANTHROPIC_CUSTOM_HEADERS instead:‍

{
  "env": {
    "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
    "ANTHROPIC_CUSTOM_HEADERS": "x-tfy-api-key: your-truefoundry-api-key\nX-TFY-LOGGING-CONFIG: {\"enabled\": true}",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022"
  }
}

Why this pattern is good for Max users:

You keep your Anthropic Max subscription for Claude Code's session auth - the Authorization header flows through to Anthropic as-is
TrueFoundry authenticates separately via x-tfy-api-key - the gateway governs the request while Anthropic handles billing via your subscription
You get centralized governance (visibility, quotas, RBAC, logs, guardrails) without changing your day-to-day Claude Code workflow

See TrueFoundry Claude Code documentation for the full integration guide, and Claude Code Max integration for the Max subscription variant.

Step 2: Use GPT-5, Gemini, and Any Model Through Claude Code

This is where a Claude Code proxy goes from convenient to transformative. Once Claude Code routes through TrueFoundry, it can reach any model from any provider not just Anthropic. You add provider accounts in the TrueFoundry gateway dashboard (OpenAI, Google Vertex AI, AWS Bedrock, Azure OpenAI, xAI, or your own on-prem deployment), and those models become available at the same gateway endpoint.

Pointing Claude Code Aliases at Non-Anthropic Models

To use GPT-5 for Claude Code's "opus" slot (your most capable model tier), simply update the model alias:

{
  "env": {
    "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
    "ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "openai-main/gpt-5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "google-vertex/gemini-2.5-flash",
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1"
  }
}

In this configuration:

/model opus → GPT-5 (for complex architecture and planning tasks)
/model sonnet → Claude Sonnet 4 (for standard coding tasks)
/model haiku → Gemini 2.5 Flash (for fast, lightweight tasks like email validation, quick lookups)

The developer experience is identical. Developers still use /model opus or --model haiku. They don't need to know which provider is behind each alias, or manage credentials for OpenAI or Google.

Using Virtual Models for Advanced Routing

TrueFoundry's Virtual Models let you create a single model identifier that routes requests across multiple providers with weight-based, priority-based, or latency-based routing. Point a Claude Code model alias at a virtual model, and the gateway handles the routing logic transparently.

Example: Priority-based fallback across providers

If your primary Anthropic account hits rate limits, automatically fall back to Bedrock Claude, then to GPT-4 - without any developer noticing:

routing_config:
  type: priority-based-routing
  load_balance_targets:
    - target: anthropic-main/claude-sonnet-4-20250514
      priority: 0
      fallback_status_codes: ["429", "500", "502", "503"]
    - target: bedrock-main/claude-sonnet-4-20250514
      priority: 1
      fallback_status_codes: ["429", "500"]
    - target: openai-main/gpt-4o
      priority: 2

Example: Weight-based A/B evaluation

Canary a new model for 10% of Claude Code traffic before committing the whole team:‍

routing_config:
  type: weight-based-routing
  load_balance_targets:
    - target: anthropic-main/claude-4-sonnet-20250514
      weight: 90
    - target: openai-main/gpt-5
      weight: 10

Then point Claude Code's sonnet alias at this virtual model. 10% of Claude Code sonnet requests go to GPT-5 with full cost and quality metrics in the gateway dashboard to compare the results.

Step 3: Enterprise Controls That Apply to Every Claude Code Request

Once Claude Code routes through TrueFoundry, every request inherits enterprise-grade governance not because developers configure it, but because it's enforced at the gateway layer.

Budget Limits: Stop Cost Overruns Before They Happen

TrueFoundry's hierarchical budget limiting fires before the token is consumed, not after the monthly bill arrives. Rules stack and combine:

Order	Rule ID	Filter	Budget	Per
1	`senior-eng-budget`	Subjects: `team:senior-engineers`	$50/day	User
2	`default-dev-budget`	(matches all)	$10/day	User
3	`opus-monthly-cap`	Models: `anthropic-main/claude-4-opus`	$1000/month	Shared

Senior engineers get $50/day. All others default to $10/day. And total Opus spending across the entire organization is capped at $1000/month — so even if every developer is within their personal limit, the org-level model budget cannot be blown through.

Rate Limiting: Protect On-Prem and Control Environments

TrueFoundry AI Gateway interface showing how to configure rate limitingrules through the Configtab

Rate limiting at the gateway handles three Claude Code-specific scenarios:

CI pipelines: Claude Code runs in CI should be rate-limited independently of interactive developer sessions. A test suite that calls Claude Code for code review shouldn't burn through the same quota as a developer's active coding session.
Development vs. production models: Metadata-scoped rate limits let you route environment: dev requests to a cheaper model and cap their request rate — without affecting production.
On-prem GPU protection: If you're running on-prem models as the primary target for Claude Code, rate-limit the on-prem endpoint and auto-burst to the cloud API when capacity is saturated.

# Limit Claude Code in CI to 500 requests/day on GPT-4
- id: ci-pipeline-limit
  when:
    models: ['openai-main/gpt-4']
    metadata:
      environment: ci
  limit_to: 500
  unit: requests_per_day

Cost Attribution: Know Exactly Who Is Spending What

Every Claude Code request processed by TrueFoundry is automatically attributed to the authenticated user. The analytics dashboard shows cost broken down by developer, team, model, and date — filterable by any metadata tag you pass via the X-TFY-METADATA header.

For teams using project-based cost attribution, tag Claude Code requests with project_id or feature metadata and every request automatically maps to the right cost center:

{ "env": { "ANTHROPIC_CUSTOM_HEADERS": "X-TFY-METADATA: {\"team\": \"platform\", \"project_id\": \"infra-2026\"}" } }

All traces export via OpenTelemetry to Grafana, Datadog, Splunk, or your existing observability stack.

Step 4: Deploy Across Your Whole Engineering Team

Configuring one developer's settings.json is easy. Enforcing a consistent proxy configuration across every developer in your organization requires a deployment strategy. TrueFoundry supports three approaches:

Option A: MDM-Pushed Managed Settings (Recommended for Enterprises)

Push a managed-settings.json file to every corporate device via your MDM solution (Jamf, Kandji, Mosyle, Intune) and lock it against modification at the OS level. This is Claude Code's endpoint-managed settings approach.

{ "model": "sonnet", "availableModels": ["sonnet", "haiku"], "env": { "ANTHROPIC_BASE_URL": "https://your-gateway.internal.corp", "ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514", "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514", "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022" } }

System-level paths:

macOS: /Library/Application Support/ClaudeCode/managed-settings.json
Linux: /etc/claude-code/managed-settings.json

This configuration is tamper-resistant, applies immediately at startup with no network dependency, and requires no developer action. Every machine that receives the MDM profile is automatically proxied through TrueFoundry.

Option B: Server-Managed Settings via Anthropic Admin Console

Configure settings centrally via the Claude Admin Console (Admin Settings → Claude Code → Managed settings). Settings are delivered from Anthropic's servers when developers authenticate with their organization credentials — no file deployment needed.

This approach requires no MDM infrastructure and works on BYOD machines. Settings are delivered at authentication time and are harder for users to override.

Option C: Project-Level settings.json in Version Control

Commit a .claude/settings.json to the root of every repository. Any developer who clones the repo and runs Claude Code in that directory automatically uses the project settings — including the TrueFoundry gateway URL and model configuration.

# Check into your monorepo or template repository .claude/settings.json

This is the lowest-friction option for teams with standardized repository structures. New developers inherit the proxy configuration the moment they clone.

Step 5: VS Code Extension and Claude Agent SDK

VS Code Extension

The Claude Code VS Code extension works seamlessly with TrueFoundry once you've configured the CLI. The extension is not standalone — it requires the Claude Code CLI to be installed and configured first.

# macOS/Linux: Launch VS Code from terminal to inherit shell environment code .

The extension automatically uses your CLI configuration (base URL, API keys, model aliases). No separate setup needed.

macOS/Linux note: GUI applications don't inherit shell environment variables by default. Always launch VS Code from a terminal where Claude Code is configured to ensure the extension picks up ANTHROPIC_BASE_URL.

Claude Agent SDK

The Claude Agent SDK (the successor to the Claude Code SDK) works with your existing .claude/settings.json via TrueFoundry. Specify setting_sources=["project"] to load your gateway configuration programmatically:

from claude_agent_sdk import query, ClaudeAgentOptions async for message in query( prompt="Analyze my codebase for security vulnerabilities", options=ClaudeAgentOptions( setting_sources=["project"], # Loads .claude/settings.json with TrueFoundry config max_turns=5, allowed_tools=["Read", "Grep", "Glob"] ) ): if message.type == "result": print(message.result)

All TrueFoundry configurations — Anthropic Direct, AWS Bedrock, Google Vertex AI — work identically with the Agent SDK.

DIY Claude Code Proxy vs. TrueFoundry AI Gateway

<div class="table-wrapper" style="overflow-x:auto; margin: 24px 0;"><table style="width:100%; border-collapse:collapse; font-size:14px; font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',sans-serif;"><thead><tr style="background:#0a0f1e; color:#00d4c8;"><th style="padding:12px 16px; text-align:left; border-bottom:2px solid #00d4c8;">Capability</th><th style="padding:12px 16px; text-align:left; border-bottom:2px solid #00d4c8;">DIY Reverse Proxy</th><th style="padding:12px 16px; text-align:left; border-bottom:2px solid #00d4c8;">TrueFoundry AI Gateway</th></tr></thead><tbody><tr style="background:#f8fafc;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Setup time</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Days to weeks</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Minutes — one env var change</td></tr><tr style="background:#ffffff;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Multi-provider routing</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Custom build required</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Built-in: Anthropic, OpenAI, Gemini, Bedrock, Azure, on-prem</td></tr><tr style="background:#f8fafc;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Per-developer budget limits</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Not included</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Hierarchical, configurable</td></tr><tr style="background:#ffffff;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Cost attribution dashboard</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Custom build required</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Built-in with OTEL export</td></tr><tr style="background:#f8fafc;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Automatic failover</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Custom retry logic per request</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Gateway-level, configurable per provider</td></tr><tr style="background:#ffffff;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Guardrails (PII, injection)</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Not included</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Built-in</td></tr><tr style="background:#f8fafc;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>RBAC / virtual accounts</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Custom build required</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Built-in with SSO/SCIM</td></tr><tr style="background:#ffffff;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Semantic caching</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Not included</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Built-in</td></tr><tr style="background:#f8fafc;"><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;"><strong>Ongoing maintenance</strong></td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">Your team owns it</td><td style="padding:12px 16px; border-bottom:1px solid #e2e8f0;">TrueFoundry-managed (SaaS) or self-hosted</td></tr><tr style="background:#ffffff;"><td style="padding:12px 16px;"><strong>Deployment modes</strong></td><td style="padding:12px 16px;">Self-hosted only</td><td style="padding:12px 16px;">SaaS, hybrid, or fully self-hosted VPC</td></tr></tbody></table></div><div style="background:#0a0f1e; border:1px solid #00d4c8; border-radius:12px; padding:40px; margin:40px 0; text-align:center; font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',sans-serif;"><p style="color:#00d4c8; font-size:11px; font-weight:700; letter-spacing:2px; text-transform:uppercase; margin:0 0 12px 0;">Claude Code Proxy</p><h3 style="color:#ffffff; font-size:24px; font-weight:700; margin:0 0 12px 0; line-height:1.3;">One Gateway Endpoint. Every Model. Full Enterprise Control.</h3><p style="color:#94a3b8; font-size:15px; line-height:1.6; margin:0 0 28px 0; max-width:520px; margin-left:auto; margin-right:auto;">Route all Claude Code traffic through TrueFoundry AI Gateway — budget limits, cost attribution, multi-provider routing, and RBAC. One environment variable. Minutes to set up.</p><a href="https://www.truefoundry.com/book-demo" style="background:#00d4c8; color:#0a0f1e; padding:13px 28px; border-radius:8px; font-weight:700; font-size:14px; text-decoration:none; display:inline-block;">Book a Demo →</a></div>

Advanced: Opus Fast Mode via Virtual Model

Claude Opus 4.6 Fast Mode provides lower-latency responses but requires the speed: fast parameter and a special beta header. Configure it through TrueFoundry without touching individual developer machines:

Step 1: Create a Virtual Model in TrueFoundry pointing to claude-opus-4-6.

Step 2: In the virtual model's additional parameters, add:

{ "speed": "fast" }

Step 3: Add the required beta header in the virtual model configuration:

anthropic-beta: fast-mode-2026-02-01

Step 4: Reference the virtual model in settings.json:

{ "env": { "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}", "ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key", "ANTHROPIC_DEFAULT_OPUS_MODEL": "your-account/your-fast-opus-virtual-model", "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514", "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022", "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1" } }

To control Fast Mode access (since it consumes significantly more credits), create two virtual models — one that strips the speed field and one that passes it through — and distribute the appropriate model ID to users based on their access tier.

What Claude Code Can Govern Through TrueFoundry

Frequently Asked Questions

What is a Claude Code proxy and do I need one?A Claude Code proxy is a server that sits between Claude Code and model providers, giving you centralized controls that aren't available when Claude Code connects directly to Anthropic. You need one if: you have more than a handful of developers using Claude Code, you want cost visibility or per-developer budget limits, you need to use models from providers other than Anthropic, or you need to enforce security policies (PII detection, prompt injection protection) on Claude Code traffic. For individual developers, a proxy adds no material value. For teams, it's the infrastructure layer that makes Claude Code manageable at scale.

Does using a proxy change the Claude Code experience for developers?No. Developers use Claude Code exactly as before — the same commands, the same model aliases (/model opus, /model sonnet), the same VS Code extension. The only visible change is that they authenticate with a TrueFoundry API key instead of an Anthropic key, and the model names in settings.json follow the TrueFoundry format (provider-account/model-name). Everything else is identical.

Can I use GPT-5 or Gemini through Claude Code?Yes. Once Claude Code routes through TrueFoundry AI Gateway, you can point any Claude Code model alias at any provider configured in the gateway — including OpenAI's GPT-5, Google's Gemini 2.5 Pro and Flash, AWS Bedrock models, Azure OpenAI, xAI Grok, or your own on-prem models. See TrueFoundry Virtual Models documentation for routing configuration.

Does the gateway add latency?TrueFoundry AI Gateway adds approximately 3–4ms p95 overhead and handles 350+ RPS on a single vCPU. Claude Code response times are measured in seconds — 3–4ms of gateway overhead is below measurement noise.

How do I enforce the proxy across all developers without touching each machine?Three options: (1) MDM — push a managed-settings.json to every corporate device via Jamf, Kandji, or Intune and lock it at the OS level; (2) Anthropic Admin Console server-managed settings — configure centrally and deliver at authentication time; (3) commit .claude/settings.json to version control so every developer who clones the repo inherits the configuration. For most enterprises, MDM is the most robust because it's tamper-resistant and doesn't require developer action.

What happens if Anthropic's API goes down?If you've configured a Virtual Model with fallback providers, the gateway automatically retries on the next target when it receives a 429, 500, 502, or 503 from Anthropic. Claude Code never sees the error — it continues working on the fallback provider. This requires configuring fallback targets in your Virtual Model; Anthropic-only configurations would fail as usual during an outage.

Can TrueFoundry proxy MCP servers for Claude Code as well?Yes. The TrueFoundry MCP Gateway handles MCP server governance alongside model governance — centralized auth for MCP servers, RBAC at the tool level, pre-execution guardrails, and a full invocation audit trail. Claude Code can reach all governed MCP servers through a single authenticated gateway endpoint. See the enterprise security guide for Claude Code for the combined model + MCP proxy configuration.

Conclusion

The ANTHROPIC_BASE_URL environment variable is one of the most powerful levers in the Claude Code ecosystem — but most teams are leaving it unconfigured, accepting direct Anthropic API calls with no visibility, no controls, and no flexibility to use other models.

A Claude Code proxy changes the equation. Every request flows through a single enforcement point that applies governance uniformly — regardless of which developer sent it, which project they're working on, or which IDE they're using. Cost visibility goes from monthly-invoice-surprise to real-time dashboard. Budget overruns go from "discovered too late" to "blocked before they happen." Multi-model access goes from "code changes in every application" to "change one config in the gateway."

TrueFoundry AI Gateway is the enterprise Claude Code proxy — purpose-built for the Anthropic-compatible API format, deployed in minutes via ANTHROPIC_BASE_URL, and delivering enterprise controls (budget limits, RBAC, multi-provider routing, guardrails, OTEL observability) that would take months to build correctly from scratch.

One environment variable separates where you are from where you need to be.

<div style="background:#0a0f1e; border:1px solid #00d4c8; border-radius:12px; padding:48px; margin:48px 0; text-align:center; font-family:-apple-system,BlinkMacSystemFont,'Segoe UI',sans-serif;"><p style="color:#00d4c8; font-size:11px; font-weight:700; letter-spacing:2px; text-transform:uppercase; margin:0 0 16px 0;">Claude Code Proxy</p><h2 style="color:#ffffff; font-size:30px; font-weight:700; margin:0 0 16px 0; line-height:1.25;">Enterprise Claude Code Governance in Minutes</h2><p style="color:#94a3b8; font-size:15px; line-height:1.7; margin:0 auto 32px auto; max-width:560px;">Point Claude Code at TrueFoundry AI Gateway. Budget limits, cost attribution, GPT-5 and Gemini routing, RBAC, guardrails — applied to every developer, every request, centrally.</p><div style="display:flex; gap:16px; justify-content:center; flex-wrap:wrap; margin-bottom:32px;"><a href="https://www.truefoundry.com/book-demo" style="background:#00d4c8; color:#0a0f1e; padding:14px 28px; border-radius:8px; font-weight:700; font-size:15px; text-decoration:none; display:inline-block;">Book a Demo</a><a href="https://platform.live-demo.truefoundry.cloud" style="background:transparent; color:#00d4c8; padding:14px 28px; border-radius:8px; font-weight:600; font-size:15px; text-decoration:none; display:inline-block; border:1px solid #00d4c8;">Try Live Demo</a></div><div style="display:flex; gap:32px; justify-content:center; flex-wrap:wrap;"><span style="color:#64748b; font-size:13px;">✓ ~3–4ms Gateway Overhead</span><span style="color:#64748b; font-size:13px;">✓ 350+ RPS on 1 vCPU</span><span style="color:#64748b; font-size:13px;">✓ SaaS or Self-Hosted</span><span style="color:#64748b; font-size:13px;">✓ MDM-Ready Deployment</span></div></div>

Canva Cover Image Prompt

A dark enterprise illustration on deep navy background. A developer's terminal window in the center with Claude Code active. From the terminal, a single glowing teal arrow flows upward into a hexagonal control hub labeled "TrueFoundry AI Gateway" — the hub glows teal with circuit-board patterns. From the hub, multiple arrows fan out to different provider logos arranged in a semicircle: Anthropic (Claude), OpenAI (GPT), Google (Gemini), AWS (Bedrock). Each provider logo is connected with labeled lines: "Priority Routing", "Budget Cap", "Rate Limit". Small dashboard card in the corner showing a cost graph trending down. Clean geometric line style, no human figures, no text except labels. Premium SaaS dark-mode illustration quality.

‍

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now