The AI Gateway uses HTTP headers to control authentication, request routing, logging, retries, and metadata tagging. This page covers all available request and response headers, along with custom metadata and logging configuration.
| Name | Description | Example |
|---|
Authorization | Your TrueFoundry API key as bearer token | Authorization: Bearer TFY_API_KEY |
x-tfy-metadata | Stringified JSON where both keys and values must be strings. Used for request routing and metrics filtering | x-tfy-metadata: {"custom_field":"value"} |
x-tfy-provider-name | Required for responses API, file upload API, and batch APIs to route requests to the correct provider account | x-tfy-provider-name: openai |
x-tfy-strict-openai | Boolean flag to enable strict OpenAI compatibility (set to false for Claude reasoning model responses with thinking tokens) | x-tfy-strict-openai: true |
x-tfy-retry-config | JSON object to configure retry behavior for failed requests | x-tfy-retry-config: {"attempts": 3, "onStatusCodes": [429, 500, 503]} |
x-tfy-request-timeout | Number in milliseconds specifying the maximum time to wait for a response from a single model. If fallbacks or retries are configured, the timeout is applied per model request (i.e., each attempt, including fallbacks, will have its own timeout). | x-tfy-request-timeout: 60000 |
x-tfy-ttft-timeout-ms | Number in milliseconds specifying the maximum time to wait for the first token in a streaming response (time-to-first-token). If no token is received within this window, the request is considered timed out and the gateway returns 408. For virtual models or routing config, the gateway falls back to the next model on 408 even if 408 is not included in the fallback status codes. | x-tfy-ttft-timeout-ms: 30000 |
x-tfy-logging-config | Configuration for request logging | x-tfy-logging-config: {"enabled": true} |
x-tfy-mcp-headers | Stringified JSON to pass custom headers to MCP servers. Format varies by API — see MCP Gateway and Agent API docs. Agent API only supports registered servers. | x-tfy-mcp-headers: {"truefoundry:...":{"Authorization":"Bearer TOKEN"}} |
You can tag requests with custom metadata using the X-TFY-METADATA header. Metadata is a JSON object where both keys and values must be strings, with a maximum value length of 128 characters.
With metadata, you can:
- Enhance Observability: Filter request logs and create custom metrics dashboards grouped by metadata keys.
- Apply Conditional Configurations: Use metadata in the
when block of gateway configurations to selectively apply rate limiting, model fallbacks, load balancing, and more.
from openai import OpenAI
USE_STREAM = True
client = OpenAI(api_key="your_truefoundry_api_key", base_url="https://gateway.truefoundry.ai")
stream = client.chat.completions.create(
messages=[
{"role": "system", "content": "You are an AI bot."},
{"role": "user", "content": "Enter your prompt here"},
],
model="openai-main/gpt-4",
stream=USE_STREAM,
extra_headers={
"X-TFY-METADATA": '{"application":"booking-bot", "environment":"staging", "customer_id":"123456"}',
}
)
if USE_STREAM:
for chunk in stream:
if (
chunk.choices
and len(chunk.choices) > 0
and chunk.choices[0].delta.content is not None
):
print(chunk.choices[0].delta.content, end="")
else:
print(stream.choices[0].message.content)
Filter Logs
Filter your request logs using one or more metadata keys to isolate specific requests. This is useful for debugging or analyzing usage patterns for a particular feature, environment, or user.
Create Custom Metrics
Group metrics by metadata keys to create custom visualizations. For example, monitor cost and usage per customer by grouping with a customer_id key.
Metadata can be used in the when block of gateway configurations to selectively apply rules. For example, to rate limit requests from the dev environment:
name: ratelimiting-config
type: gateway-rate-limiting-config
rules:
- id: 'openai-gpt4-dev-env'
when:
models: ['openai-main/gpt4']
metadata:
env: dev
limit_to: 1000
unit: requests_per_day
You can also use metadata to configure Load Balancing and Fallbacks. Learn more in Virtual Models and Routing Config.
Logging Configuration
Control whether individual requests are logged using the X-TFY-LOGGING-CONFIG header.
Enable Logging
from openai import OpenAI
client = OpenAI(
api_key="your_truefoundry_api_key",
base_url="https://gateway.truefoundry.ai",
default_headers={
"X-TFY-LOGGING-CONFIG": '{"enabled": true}'
}
)
Disable Logging
To prevent a request from being logged, set enabled to false:
client = OpenAI(
api_key="your_truefoundry_api_key",
base_url="https://gateway.truefoundry.ai",
default_headers={
"X-TFY-LOGGING-CONFIG": '{"enabled": false}'
}
)
Server-Side Logging Mode (Self-Hosted Only)
Environment variable configuration is only available when running a self-hosted instance of TrueFoundry AI Gateway.
You can control logging behavior globally by setting the REQUEST_LOGGING_MODE environment variable:
| Mode | Description |
|---|
HEADER_CONTROLLED | Logging depends on the enabled value in the X-TFY-LOGGING-CONFIG header. If the header is absent or set to true, logging will occur. If set to false, no logging will happen. |
ALWAYS | All requests are logged regardless of the enabled value. |
NEVER | No requests are logged regardless of the enabled value. |
To view logged requests, go to AI Gateway > Monitor > Requests in the TrueFoundry UI. See View Traces for details.
| Name | Description |
|---|
x-tfy-resolved-model | The final TrueFoundry model ID used to process the request (may differ from requested model due to load balancing or fallbacks) |
x-tfy-applied-configurations | Dictionary of applied configurations including load balancing, fallback, model config, applied guardrails, and rate limiting |
server-timing | For non-streaming requests only. Contains timing information for different processing stages including middlewares, guardrails, and model calls |
Server-Timing Breakdown
When inspecting network requests in your browser’s developer tools, the server-timing header provides a detailed performance breakdown:
| Processing Stage | Duration | Description |
|---|
| Authentication | 0.9 ms | Authenticating User |
| Input guardrails | 0.7 ms | Input validation and content filtering |
| Model call | 1350 ms | AI model response generation (bulk of the time) |
| Output guardrails | 722.3 ms | Output validation and filtering |
| Logging | 1.1 ms | Logging request |
| Total | 2080 ms | Complete request processing time (2.08 seconds) |
Metrics like load balancing (0 ms), rate limiting (0 ms), and cost budget (0 ms) show zero duration because these configs weren’t triggered for this particular request.