Claude Code Proxy: Route Claude, GPT-5 & Gemini Through TrueFoundry AI Gateway

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

Introduction

Claude Code is the most powerful AI coding assistant available today. Engineers who adopt it rarely go back. But when dozens or hundreds of engineers start using it at the same time, a new problem appears: Claude Code, by default, talks directly to Anthropic's API. Every developer authenticates with their own key, uses Anthropic models exclusively, and generates API spend that is completely invisible to the platform team until the monthly invoice arrives.

A Claude Code proxy is the answer. By pointing Claude Code at a proxy endpoint instead of directly at Anthropic, you gain a centralized control point for every model call across your entire engineering organization: visibility into who is spending what, the ability to enforce budget caps before they're exceeded, access to models from any provider - GPT-5, Gemini 2.5 Pro, Llama via Bedrock - through the same interface Claude Code already knows, and the ability to deploy gateway configuration once and have it apply to all developers without touching individual machines.

TrueFoundry AI Gateway is the enterprise-grade Claude Code proxy. It is a drop-in Anthropic-compatible endpoint that Claude Code connects to with a single environment variable change. Once connected, every Claude Code request flows through the gateway giving you observability, cost controls, multi-model routing, and enterprise security policies that apply to the whole organization, not just the developers who remember to configure them.

This guide explains exactly what a Claude Code proxy does, why TrueFoundry AI Gateway is the right one for enterprise engineering teams, and how to configure it, including for both standard API key and Claude Max subscription flows.

What Is a Claude Code Proxy?

Claude Code ships with a single configuration knob for changing its backend: the ANTHROPIC_BASE_URL environment variable. When set, Claude Code sends all its API requests - messages, model calls, streaming responses to that URL instead of to https://api.anthropic.com.

That one variable is the foundation of every Claude Code proxy. A proxy is any server that:

Accepts Anthropic-format API requests from Claude Code
Adds controls, routing, or observability at the proxy layer
Forwards requests to the actual model provider (Anthropic, OpenAI, Google, Bedrock, on-prem)
Returns responses back to Claude Code in the format it expects

The simplest possible proxy is a reverse proxy with logging. The most sophisticated is an enterprise AI gateway that handles authentication, budget enforcement, model routing across providers, semantic caching, guardrails, and full audit trails - all transparently, with no changes to how Claude Code behaves for the developer.

Why do teams build or adopt a Claude Code proxy?

Cost control: Multiple developers using Claude Code with individual Anthropic keys generate spend that is invisible until month-end. A proxy intercepts every request and enforces per-developer daily limits before costs exceed budget.
Multi-model access: Claude Code's interface is powerful, but Claude models are not always the best or most cost-effective choice for every task. A proxy lets you route haiku-tier tasks to GPT-4o-mini or Gemini Flash, and opus-tier tasks to the best available model without any client-side changes.
Enterprise security: Direct API keys on developer laptops are a security liability. A proxy centralizes credentials: developers authenticate to the proxy, and the proxy holds provider keys. No Anthropic key ever needs to live on a developer machine.
Team-wide governance: Individual developers can configure their own ANTHROPIC_BASE_URL. But enforcing it across an entire team requires a centralized deployment mechanism - MDM, server-managed settings, or a shared project .claude/settings.json checked into version control.

Why TrueFoundry AI Gateway Is the Right Claude Code Proxy

There are three ways to proxy Claude Code: build your own, use a simple reverse proxy, or use a purpose-built AI gateway. Building your own means owning the maintenance, security, and reliability of a production API gateway. A simple reverse proxy adds logging but none of the controls. TrueFoundry AI Gateway gives you everything an enterprise engineering team actually needs without building or maintaining it.

TrueFoundry AI Gateway is a unified proxy layer between Claude Code and your model providers. It accepts the same Anthropic API format that Claude Code already speaks, so Claude Code never needs to know it's talking to a gateway rather than directly to Anthropic. Behind the gateway, you can connect any provider: Anthropic direct, AWS Bedrock, Google Vertex AI, Azure OpenAI, OpenAI, or your own on-prem models.

Here is what Claude Code actually sees:

Claude Code  →  ANTHROPIC_BASE_URL (TrueFoundry Gateway)  →  Anthropic / OpenAI / Gemini / Bedrock / On-prem

Every Claude Code request that flows through TrueFoundry gains, automatically:

Capability	What It Does for Claude Code Users	TrueFoundry Feature
Multi-provider model access	Use GPT-5, Gemini 2.5 Pro, Llama, or on-prem models through the same Claude Code interface	Virtual Models
Per-developer budget limits	Blocks requests when daily or monthly spend caps are hit — before cost overruns, not after	Budget Limiting
Rate limiting	Throttle per-developer, per-team, or per-environment request rates	Rate Limiting
Cost attribution	Dashboard showing exactly which developer, team, and model drove every dollar of spend	Analytics
RBAC and virtual keys	No Anthropic API keys on developer machines — team members authenticate with TrueFoundry keys scoped to their access level	Access Control
Automatic failover	If Anthropic hits a rate limit or outage, the gateway silently retries on the next configured provider	Load Balancing & Fallbacks
Guardrails	PII detection, prompt injection protection, and custom content policies applied before requests reach the model	Guardrails
Full audit trail	Every request logged with user, model, token count, cost, and latency — exportable via OpenTelemetry	OpenTelemetry Export

~3–4ms p95 gateway overhead, 350+ RPS on a single vCPU. At Claude Code response times (seconds, not milliseconds), the gateway adds no perceptible latency.

Step 1: Point Claude Code at TrueFoundry AI Gateway

The core configuration is a single environment variable:

export ANTHROPIC_BASE_URL="https://<your-truefoundry-gateway-url>"

For persistent configuration - which is what you want for production use - edit Claude Code's settings.json. Two paths are supported:

Global (applies to all projects): ~/.claude/settings.json
Project-specific (checked into version control): .claude/settings.json in your project directory

Standard API Key Configuration

Use this when developers authenticate with a TrueFoundry API key (the recommended enterprise pattern — no Anthropic keys on developer machines):‍

{
  "env": {
    "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
    "ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key",
    "ANTHROPIC_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022",
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1",
    "ANTHROPIC_CUSTOM_HEADERS": "x-tfy-anthropic-beta: context-management-2025-06-27"
  }
}

What each field does:

ANTHROPIC_BASE_URL — redirects all Claude Code requests to TrueFoundry
ANTHROPIC_AUTH_TOKEN — TrueFoundry API key; authenticates the developer to the gateway (replaces Anthropic API key)
ANTHROPIC_MODEL — the default model for Claude Code sessions
ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, ANTHROPIC_DEFAULT_HAIKU_MODEL — map Claude Code's built-in model aliases (/model opus, /model sonnet, /model haiku) to your TrueFoundry-configured models
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS — disables experimental Claude Code features for stable gateway behavior
ANTHROPIC_CUSTOM_HEADERS — forwards the x-tfy-anthropic-beta header to Anthropic for beta features like context management

Important: Claude Code detects model capabilities (extended thinking, ToolSearch, beta tool blocks) by string-matching the model ID. Make sure ANTHROPIC_DEFAULT_OPUS_MODEL, ANTHROPIC_DEFAULT_SONNET_MODEL, and ANTHROPIC_DEFAULT_HAIKU_MODEL contain a recognizable Anthropic model ID like claude-opus-4-7, claude-sonnet-4-6, or claude-haiku-4-5. If you're using a TrueFoundry Virtual Model, ensure its display name contains the underlying model ID - e.g. your-account/claude-haiku-4-5 — so string-matching succeeds.

Claude Code Max Subscription Configuration

If your team uses Claude Code Max subscriptions, Claude Code reserves the Authorization header for Anthropic account authentication. Use x-tfy-api-key in ANTHROPIC_CUSTOM_HEADERS instead:‍

{
  "env": {
    "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
    "ANTHROPIC_CUSTOM_HEADERS": "x-tfy-api-key: your-truefoundry-api-key\nX-TFY-LOGGING-CONFIG: {\"enabled\": true}",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022"
  }
}

Why this pattern is good for Max users:

You keep your Anthropic Max subscription for Claude Code's session auth - the Authorization header flows through to Anthropic as-is
TrueFoundry authenticates separately via x-tfy-api-key - the gateway governs the request while Anthropic handles billing via your subscription
You get centralized governance (visibility, quotas, RBAC, logs, guardrails) without changing your day-to-day Claude Code workflow

See TrueFoundry Claude Code documentation for the full integration guide, and Claude Code Max integration for the Max subscription variant.

Step 2: Use GPT-5, Gemini, and Any Model Through Claude Code

This is where a Claude Code proxy goes from convenient to transformative. Once Claude Code routes through TrueFoundry, it can reach any model from any provider not just Anthropic. You add provider accounts in the TrueFoundry gateway dashboard (OpenAI, Google Vertex AI, AWS Bedrock, Azure OpenAI, xAI, or your own on-prem deployment), and those models become available at the same gateway endpoint.

Pointing Claude Code Aliases at Non-Anthropic Models

To use GPT-5 for Claude Code's "opus" slot (your most capable model tier), simply update the model alias:

{
  "env": {
    "ANTHROPIC_BASE_URL": "{GATEWAY_BASE_URL}",
    "ANTHROPIC_AUTH_TOKEN": "your-truefoundry-api-key",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "openai-main/gpt-5",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "google-vertex/gemini-2.5-flash",
    "CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS": "1"
  }
}

In this configuration:

/model opus → GPT-5 (for complex architecture and planning tasks)
/model sonnet → Claude Sonnet 4 (for standard coding tasks)
/model haiku → Gemini 2.5 Flash (for fast, lightweight tasks like email validation, quick lookups)

The developer experience is identical. Developers still use /model opus or --model haiku. They don't need to know which provider is behind each alias, or manage credentials for OpenAI or Google.

Using Virtual Models for Advanced Routing

TrueFoundry's Virtual Models let you create a single model identifier that routes requests across multiple providers with weight-based, priority-based, or latency-based routing. Point a Claude Code model alias at a virtual model, and the gateway handles the routing logic transparently.

Example: Priority-based fallback across providers

If your primary Anthropic account hits rate limits, automatically fall back to Bedrock Claude, then to GPT-4 - without any developer noticing:

routing_config:
  type: priority-based-routing
  load_balance_targets:
    - target: anthropic-main/claude-sonnet-4-20250514
      priority: 0
      fallback_status_codes: ["429", "500", "502", "503"]
    - target: bedrock-main/claude-sonnet-4-20250514
      priority: 1
      fallback_status_codes: ["429", "500"]
    - target: openai-main/gpt-4o
      priority: 2

Example: Weight-based A/B evaluation

Canary a new model for 10% of Claude Code traffic before committing the whole team:‍

routing_config:
  type: weight-based-routing
  load_balance_targets:
    - target: anthropic-main/claude-4-sonnet-20250514
      weight: 90
    - target: openai-main/gpt-5
      weight: 10

Then point Claude Code's sonnet alias at this virtual model. 10% of Claude Code sonnet requests go to GPT-5 with full cost and quality metrics in the gateway dashboard to compare the results.

Step 3: Enterprise Controls That Apply to Every Claude Code Request

Once Claude Code routes through TrueFoundry, every request inherits enterprise-grade governance not because developers configure it, but because it's enforced at the gateway layer.

Budget Limits: Stop Cost Overruns Before They Happen

TrueFoundry's hierarchical budget limiting fires before the token is consumed, not after the monthly bill arrives. Rules stack and combine:

Order	Rule ID	Filter	Budget	Per
1	`senior-eng-budget`	Subjects: `team:senior-engineers`	$50/day	User
2	`default-dev-budget`	(matches all)	$10/day	User
3	`opus-monthly-cap`	Models: `anthropic-main/claude-4-opus`	$1000/month	Shared

Senior engineers get $50/day. All others default to $10/day. And total Opus spending across the entire organization is capped at $1000/month — so even if every developer is within their personal limit, the org-level model budget cannot be blown through.

Rate Limiting: Protect On-Prem and Control Environments

TrueFoundry AI Gateway interface showing how to configure rate limitingrules through the Configtab

Rate limiting at the gateway handles three Claude Code-specific scenarios:

CI pipelines: Claude Code runs in CI should be rate-limited independently of interactive developer sessions. A test suite that calls Claude Code for code review shouldn't burn through the same quota as a developer's active coding session.
Development vs. production models: Metadata-scoped rate limits let you route environment: dev requests to a cheaper model and cap their request rate — without affecting production.
On-prem GPU protection: If you're running on-prem models as the primary target for Claude Code, rate-limit the on-prem endpoint and auto-burst to the cloud API when capacity is saturated.

# Limit Claude Code in CI to 500 requests/day on GPT-4
- id: ci-pipeline-limit
  when:
    models: ['openai-main/gpt-4']
    metadata:
      environment: ci
  limit_to: 500
  unit: requests_per_day

Cost Attribution: Know Exactly Who Is Spending What

Every Claude Code request processed by TrueFoundry is automatically attributed to the authenticated user. The analytics dashboard shows cost broken down by developer, team, model, and date - filterable by any metadata tag you pass via the X-TFY-METADATA header.

For teams using project-based cost attribution, tag Claude Code requests with project_id or feature metadata and every request automatically maps to the right cost center:‍

{
  "env": {
    "ANTHROPIC_CUSTOM_HEADERS": "X-TFY-METADATA: {\"team\": \"platform\", \"project_id\": \"infra-2026\"}"
  }
}

All traces export via OpenTelemetry to Grafana, Datadog, Splunk, or your existing observability stack.

Step 4: Deploy Across Your Whole Engineering Team

Configuring one developer's settings.json is easy. Enforcing a consistent proxy configuration across every developer in your organization requires a deployment strategy. TrueFoundry supports three approaches:

Option A: MDM-Pushed Managed Settings (Recommended for Enterprises)

Push a managed-settings.json file to every corporate device via your MDM solution (Jamf, Kandji, Mosyle, Intune) and lock it against modification at the OS level. This is Claude Code's endpoint-managed settings approach.‍

{
  "model": "sonnet",
  "availableModels": ["sonnet", "haiku"],
  "env": {
    "ANTHROPIC_BASE_URL": "https://your-gateway.internal.corp",
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "anthropic/claude-4-opus-20250514",
    "ANTHROPIC_DEFAULT_SONNET_MODEL": "anthropic/claude-4-sonnet-20250514",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "anthropic/claude-3-5-haiku-20241022"
  }
}

System-level paths:

macOS: /Library/Application Support/ClaudeCode/managed-settings.json
Linux: /etc/claude-code/managed-settings.json

This configuration is tamper-resistant, applies immediately at startup with no network dependency, and requires no developer action. Every machine that receives the MDM profile is automatically proxied through TrueFoundry.

Option B: Server-Managed Settings via Anthropic Admin Console

Configure settings centrally via the Claude Admin Console (Admin Settings → Claude Code → Managed settings). Settings are delivered from Anthropic's servers when developers authenticate with their organization credentials - no file deployment needed.

This approach requires no MDM infrastructure and works on BYOD machines. Settings are delivered at authentication time and are harder for users to override.

Option C: Project-Level settings.json in Version Control

Commit a .claude/settings.json to the root of every repository. Any developer who clones the repo and runs Claude Code in that directory automatically uses the project settings including the TrueFoundry gateway URL and model configuration.‍

# Check into your monorepo or template repository
.claude/settings.json

This is the lowest-friction option for teams with standardized repository structures. New developers inherit the proxy configuration the moment they clone.

Step 5: VS Code Extension and Claude Agent SDK

VS Code Extension

The Claude Code VS Code extension works seamlessly with TrueFoundry once you've configured the CLI. The extension is not standalone - it requires the Claude Code CLI to be installed and configured first.‍

# macOS/Linux: Launch VS Code from terminal to inherit shell environment
code .

The extension automatically uses your CLI configuration (base URL, API keys, model aliases). No separate setup needed.

macOS/Linux note: GUI applications don't inherit shell environment variables by default. Always launch VS Code from a terminal where Claude Code is configured to ensure the extension picks up ANTHROPIC_BASE_URL.

Claude Agent SDK

The Claude Agent SDK (the successor to the Claude Code SDK) works with your existing .claude/settings.json via TrueFoundry. Specify setting_sources=["project"] to load your gateway configuration programmatically:

from claude_agent_sdk import query, ClaudeAgentOptions

async for message in query(
    prompt="Analyze my codebase for security vulnerabilities",
    options=ClaudeAgentOptions(
        setting_sources=["project"],  # Loads .claude/settings.json with TrueFoundry config
        max_turns=5,
        allowed_tools=["Read", "Grep", "Glob"]
    )
):
    if message.type == "result":
        print(message.result)

All TrueFoundry configurations - Anthropic Direct, AWS Bedrock, Google Vertex AI, work identically with the Agent SDK.

DIY Claude Code Proxy vs. TrueFoundry AI Gateway

Capability	DIY Reverse Proxy	TrueFoundry AI Gateway
Setup time	Days to weeks	Minutes — one env var change
Multi-provider routing	Custom build required	Built-in: Anthropic, OpenAI, Gemini, Bedrock, Azure, on-prem
Per-developer budget limits	Not included	Hierarchical, configurable
Cost attribution dashboard	Custom build required	Built-in with OTEL export
Automatic failover	Custom retry logic per request	Gateway-level, configurable per provider
Guardrails (PII, injection)	Not included	Built-in
RBAC / virtual accounts	Custom build required	Built-in with SSO/SCIM
Semantic caching	Not included	Built-in
Ongoing maintenance	Your team owns it	TrueFoundry-managed (SaaS) or self-hosted
Deployment modes	Self-hosted only	SaaS, hybrid, or fully self-hosted VPC