Claude Code Proxy: Route Claude, GPT-5 & Gemini Through TrueFoundry AI Gateway

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
Introduction
Claude Code is the most powerful AI coding assistant available today. Engineers who adopt it rarely go back. But when dozens or hundreds of engineers start using it at the same time, a new problem appears: Claude Code, by default, talks directly to Anthropic's API. Every developer authenticates with their own key, uses Anthropic models exclusively, and generates API spend that is completely invisible to the platform team until the monthly invoice arrives.
A Claude Code proxy is the answer. By pointing Claude Code at a proxy endpoint instead of directly at Anthropic, you gain a centralized control point for every model call across your entire engineering organization: visibility into who is spending what, the ability to enforce budget caps before they're exceeded, access to models from any provider - GPT-5, Gemini 2.5 Pro, Llama via Bedrock - through the same interface Claude Code already knows, and the ability to deploy gateway configuration once and have it apply to all developers without touching individual machines.
TrueFoundry AI Gateway is the enterprise-grade Claude Code proxy. It is a drop-in Anthropic-compatible endpoint that Claude Code connects to with a single environment variable change. Once connected, every Claude Code request flows through the gateway giving you observability, cost controls, multi-model routing, and enterprise security policies that apply to the whole organization, not just the developers who remember to configure them.
This guide explains exactly what a Claude Code proxy does, why TrueFoundry AI Gateway is the right one for enterprise engineering teams, and how to configure it, including for both standard API key and Claude Max subscription flows.
What Is a Claude Code Proxy?
Claude Code ships with a single configuration knob for changing its backend: the ANTHROPIC_BASE_URL environment variable. When set, Claude Code sends all its API requests - messages, model calls, streaming responses to that URL instead of to https://api.anthropic.com.
That one variable is the foundation of every Claude Code proxy. A proxy is any server that:
- Accepts Anthropic-format API requests from Claude Code
- Adds controls, routing, or observability at the proxy layer
- Forwards requests to the actual model provider (Anthropic, OpenAI, Google, Bedrock, on-prem)
- Returns responses back to Claude Code in the format it expects
The simplest possible proxy is a reverse proxy with logging. The most sophisticated is an enterprise AI gateway that handles authentication, budget enforcement, model routing across providers, semantic caching, guardrails, and full audit trails - all transparently, with no changes to how Claude Code behaves for the developer.
Why do teams build or adopt a Claude Code proxy?
- Cost control: Multiple developers using Claude Code with individual Anthropic keys generate spend that is invisible until month-end. A proxy intercepts every request and enforces per-developer daily limits before costs exceed budget.
- Multi-model access: Claude Code's interface is powerful, but Claude models are not always the best or most cost-effective choice for every task. A proxy lets you route haiku-tier tasks to GPT-4o-mini or Gemini Flash, and opus-tier tasks to the best available model without any client-side changes.
- Enterprise security: Direct API keys on developer laptops are a security liability. A proxy centralizes credentials: developers authenticate to the proxy, and the proxy holds provider keys. No Anthropic key ever needs to live on a developer machine.
- Team-wide governance: Individual developers can configure their own
ANTHROPIC_BASE_URL. But enforcing it across an entire team requires a centralized deployment mechanism - MDM, server-managed settings, or a shared project.claude/settings.jsonchecked into version control.
Why TrueFoundry AI Gateway Is the Right Claude Code Proxy
There are three ways to proxy Claude Code: build your own, use a simple reverse proxy, or use a purpose-built AI gateway. Building your own means owning the maintenance, security, and reliability of a production API gateway. A simple reverse proxy adds logging but none of the controls. TrueFoundry AI Gateway gives you everything an enterprise engineering team actually needs without building or maintaining it.
TrueFoundry AI Gateway is a unified proxy layer between Claude Code and your model providers. It accepts the same Anthropic API format that Claude Code already speaks, so Claude Code never needs to know it's talking to a gateway rather than directly to Anthropic. Behind the gateway, you can connect any provider: Anthropic direct, AWS Bedrock, Google Vertex AI, Azure OpenAI, OpenAI, or your own on-prem models.
Here is what Claude Code actually sees:
Claude Code → ANTHROPIC_BASE_URL (TrueFoundry Gateway) → Anthropic / OpenAI / Gemini / Bedrock / On-premEvery Claude Code request that flows through TrueFoundry gains, automatically:
~3–4ms p95 gateway overhead, 350+ RPS on a single vCPU. At Claude Code response times (seconds, not milliseconds), the gateway adds no perceptible latency.
Step 1: Point Claude Code at TrueFoundry AI Gateway
The core configuration is a single environment variable:
export ANTHROPIC_BASE_URL="https://<your-truefoundry-gateway-url>"For persistent configuration - which is what you want for production use - edit Claude Code's settings.json. Two paths are supported:
- Global (applies to all projects):
~/.claude/settings.json - Project-specific (checked into version control):
.claude/settings.jsonin your project directory
Standard API Key Configuration
Use this when developers authenticate with a TrueFoundry API key (the recommended enterprise pattern — no Anthropic keys on developer machines):
{
"env": {
"ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
"ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key",
"ANTHROPIC_MODEL": "anthropic/claude-4-sonnet-20250514",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022",
"CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1",
"ANTHROPIC_CUSTOM_HEADERS": "x-tfy-anthropic-beta: context-management-2025-06-27"
}
}What each field does:
ANTHROPIC_BASE_URL— redirects all Claude Code requests to TrueFoundryANTHROPIC_AUTH_TOKEN— TrueFoundry API key; authenticates the developer to the gateway (replaces Anthropic API key)ANTHROPIC_MODEL— the default model for Claude Code sessionsANTHROPIC_DEFAULT_OPUS_MODEL,ANTHROPIC_DEFAULT_SONNET_MODEL,ANTHROPIC_DEFAULT_HAIKU_MODEL— map Claude Code's built-in model aliases (/model opus,/model sonnet,/model haiku) to your TrueFoundry-configured modelsCLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS— disables experimental Claude Code features for stable gateway behaviorANTHROPIC_CUSTOM_HEADERS— forwards thex-tfy-anthropic-betaheader to Anthropic for beta features like context management
Important: Claude Code detects model capabilities (extended thinking, ToolSearch, beta tool blocks) by string-matching the model ID. Make sure ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, and ANTHROPIC_DEFAULT_HAIKU_MODEL contain a recognizable Anthropic model ID like claude-opus-4-7, claude-sonnet-4-6, or claude-haiku-4-5. If you're using a TrueFoundry Virtual Model, ensure its display name contains the underlying model ID - e.g. your-account/claude-haiku-4-5 — so string-matching succeeds.
Claude Code Max Subscription Configuration
If your team uses Claude Code Max subscriptions, Claude Code reserves the Authorization header for Anthropic account authentication. Use x-tfy-api-key in ANTHROPIC_CUSTOM_HEADERS instead:
{
"env": {
"ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
"ANTHROPIC_CUSTOM_HEADERS": "x-tfy-api-key: your-truefoundry-api-key\nX-TFY-LOGGING-CONFIG: {\"enabled\": true}",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022"
}
}Why this pattern is good for Max users:
- You keep your Anthropic Max subscription for Claude Code's session auth - the
Authorizationheader flows through to Anthropic as-is - TrueFoundry authenticates separately via
x-tfy-api-key- the gateway governs the request while Anthropic handles billing via your subscription - You get centralized governance (visibility, quotas, RBAC, logs, guardrails) without changing your day-to-day Claude Code workflow
See TrueFoundry Claude Code documentation for the full integration guide, and Claude Code Max integration for the Max subscription variant.
Step 2: Use GPT-5, Gemini, and Any Model Through Claude Code
This is where a Claude Code proxy goes from convenient to transformative. Once Claude Code routes through TrueFoundry, it can reach any model from any provider not just Anthropic. You add provider accounts in the TrueFoundry gateway dashboard (OpenAI, Google Vertex AI, AWS Bedrock, Azure OpenAI, xAI, or your own on-prem deployment), and those models become available at the same gateway endpoint.
Pointing Claude Code Aliases at Non-Anthropic Models
To use GPT-5 for Claude Code's "opus" slot (your most capable model tier), simply update the model alias:
{
"env": {
"ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
"ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "openai-main/gpt-5",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "google-vertex/gemini-2.5-flash",
"CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1"
}
}In this configuration:
/model opus→ GPT-5 (for complex architecture and planning tasks)/model sonnet→ Claude Sonnet 4 (for standard coding tasks)/model haiku→ Gemini 2.5 Flash (for fast, lightweight tasks like email validation, quick lookups)
The developer experience is identical. Developers still use /model opus or --model haiku. They don't need to know which provider is behind each alias, or manage credentials for OpenAI or Google.
Using Virtual Models for Advanced Routing
TrueFoundry's Virtual Models let you create a single model identifier that routes requests across multiple providers with weight-based, priority-based, or latency-based routing. Point a Claude Code model alias at a virtual model, and the gateway handles the routing logic transparently.
Example: Priority-based fallback across providers
If your primary Anthropic account hits rate limits, automatically fall back to Bedrock Claude, then to GPT-4 - without any developer noticing:
routing_config:
type: priority-based-routing
load_balance_targets:
- target: anthropic-main/claude-sonnet-4-20250514
priority: 0
fallback_status_codes: ["429", "500", "502", "503"]
- target: bedrock-main/claude-sonnet-4-20250514
priority: 1
fallback_status_codes: ["429", "500"]
- target: openai-main/gpt-4o
priority: 2Example: Weight-based A/B evaluation
Canary a new model for 10% of Claude Code traffic before committing the whole team:
routing_config:
type: weight-based-routing
load_balance_targets:
- target: anthropic-main/claude-4-sonnet-20250514
weight: 90
- target: openai-main/gpt-5
weight: 10Then point Claude Code's sonnet alias at this virtual model. 10% of Claude Code sonnet requests go to GPT-5 with full cost and quality metrics in the gateway dashboard to compare the results.
Step 3: Enterprise Controls That Apply to Every Claude Code Request
Once Claude Code routes through TrueFoundry, every request inherits enterprise-grade governance not because developers configure it, but because it's enforced at the gateway layer.
Budget Limits: Stop Cost Overruns Before They Happen
TrueFoundry's hierarchical budget limiting fires before the token is consumed, not after the monthly bill arrives. Rules stack and combine:
Senior engineers get $50/day. All others default to $10/day. And total Opus spending across the entire organization is capped at $1000/month — so even if every developer is within their personal limit, the org-level model budget cannot be blown through.
Rate Limiting: Protect On-Prem and Control Environments

Rate limiting at the gateway handles three Claude Code-specific scenarios:
- CI pipelines: Claude Code runs in CI should be rate-limited independently of interactive developer sessions. A test suite that calls Claude Code for code review shouldn't burn through the same quota as a developer's active coding session.
- Development vs. production models: Metadata-scoped rate limits let you route
environment: devrequests to a cheaper model and cap their request rate — without affecting production. - On-prem GPU protection: If you're running on-prem models as the primary target for Claude Code, rate-limit the on-prem endpoint and auto-burst to the cloud API when capacity is saturated.
# Limit Claude Code in CI to 500 requests/day on GPT-4
- id: ci-pipeline-limit
when:
models: ['openai-main/gpt-4']
metadata:
environment: ci
limit_to: 500
unit: requests_per_dayCost Attribution: Know Exactly Who Is Spending What
Every Claude Code request processed by TrueFoundry is automatically attributed to the authenticated user. The analytics dashboard shows cost broken down by developer, team, model, and date - filterable by any metadata tag you pass via the X-TFY-METADATA header.
For teams using project-based cost attribution, tag Claude Code requests with project_id or feature metadata and every request automatically maps to the right cost center:
{
"env": {
"ANTHROPIC_CUSTOM_HEADERS": "X-TFY-METADATA: {\"team\": \"platform\", \"project_id\": \"infra-2026\"}"
}
}All traces export via OpenTelemetry to Grafana, Datadog, Splunk, or your existing observability stack.
Step 4: Deploy Across Your Whole Engineering Team
Configuring one developer's settings.json is easy. Enforcing a consistent proxy configuration across every developer in your organization requires a deployment strategy. TrueFoundry supports three approaches:
Option A: MDM-Pushed Managed Settings (Recommended for Enterprises)
Push a managed-settings.json file to every corporate device via your MDM solution (Jamf, Kandji, Mosyle, Intune) and lock it against modification at the OS level. This is Claude Code's endpoint-managed settings approach.
{
"model": "sonnet",
"availableModels": ["sonnet", "haiku"],
"env": {
"ANTHROPIC_BASE_URL": "https://your-gateway.internal.corp",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022"
}
}System-level paths:
- macOS:
/Library/Application Support/ClaudeCode/managed-settings.json - Linux:
/etc/claude-code/managed-settings.json
This configuration is tamper-resistant, applies immediately at startup with no network dependency, and requires no developer action. Every machine that receives the MDM profile is automatically proxied through TrueFoundry.
Option B: Server-Managed Settings via Anthropic Admin Console
Configure settings centrally via the Claude Admin Console (Admin Settings → Claude Code → Managed settings). Settings are delivered from Anthropic's servers when developers authenticate with their organization credentials - no file deployment needed.
This approach requires no MDM infrastructure and works on BYOD machines. Settings are delivered at authentication time and are harder for users to override.
Option C: Project-Level settings.json in Version Control
Commit a .claude/settings.json to the root of every repository. Any developer who clones the repo and runs Claude Code in that directory automatically uses the project settings including the TrueFoundry gateway URL and model configuration.
# Check into your monorepo or template repository
.claude/settings.jsonThis is the lowest-friction option for teams with standardized repository structures. New developers inherit the proxy configuration the moment they clone.
Step 5: VS Code Extension and Claude Agent SDK
VS Code Extension
The Claude Code VS Code extension works seamlessly with TrueFoundry once you've configured the CLI. The extension is not standalone - it requires the Claude Code CLI to be installed and configured first.
# macOS/Linux: Launch VS Code from terminal to inherit shell environment
code .The extension automatically uses your CLI configuration (base URL, API keys, model aliases). No separate setup needed.
macOS/Linux note: GUI applications don't inherit shell environment variables by default. Always launch VS Code from a terminal where Claude Code is configured to ensure the extension picks up ANTHROPIC_BASE_URL.
Claude Agent SDK
The Claude Agent SDK (the successor to the Claude Code SDK) works with your existing .claude/settings.json via TrueFoundry. Specify setting_sources=["project"] to load your gateway configuration programmatically:
from claude_agent_sdk import query, ClaudeAgentOptions
async for message in query(
prompt="Analyze my codebase for security vulnerabilities",
options=ClaudeAgentOptions(
setting_sources=["project"], # Loads .claude/settings.json with TrueFoundry config
max_turns=5,
allowed_tools=["Read", "Grep", "Glob"]
)
):
if message.type == "result":
print(message.result)All TrueFoundry configurations - Anthropic Direct, AWS Bedrock, Google Vertex AI, work identically with the Agent SDK.









