> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom Metadata and Headers

> Configure custom metadata and HTTP headers for authentication, observability, retries, and timeouts in the AI Gateway.

The AI Gateway uses HTTP headers to control authentication, request routing, retries, and metadata tagging. This page covers custom metadata and the available request and response headers.

## Custom Metadata

You can tag requests with custom metadata using the `X-TFY-METADATA` header. Metadata is a JSON object where both keys and values must be strings, with a maximum value length of 128 characters.

### Sending Metadata

<CodeGroup>
  ```python OpenAI SDK highlight={12-14} wrap lines theme={"dark"}
  from openai import OpenAI

  USE_STREAM = True
  client = OpenAI(api_key="your_truefoundry_api_key", base_url="https://gateway.truefoundry.ai")
  stream = client.chat.completions.create(
      messages=[
          {"role": "system", "content": "You are an AI bot."},
          {"role": "user", "content": "Enter your prompt here"},
      ],
      model="openai-main/gpt-4",
      stream=USE_STREAM,
      extra_headers={
          "X-TFY-METADATA": '{"application":"booking-bot", "environment":"staging", "customer_id":"123456"}',
      }
  )

  if USE_STREAM:
      for chunk in stream:
          if (
              chunk.choices
              and len(chunk.choices) > 0
              and chunk.choices[0].delta.content is not None
          ):
              print(chunk.choices[0].delta.content, end="")
  else:
      print(stream.choices[0].message.content)
  ```

  ```typescript Node.js highlight={6-8} wrap lines theme={"dark"}
  const OpenAI = require('openai');

  const openai = new OpenAI({
      apiKey: 'your_truefoundry_api_key',
      baseURL: 'https://gateway.truefoundry.ai',
      defaultHeaders: {
          "X-TFY-METADATA":
          '{"application":"booking-bot", "environment":"staging", "customer_id":"123456"}',
      },
  });

  async function main() {
      try {
          const stream = await openai.chat.completions.create({
              model: 'openai-main/gpt-4',
              messages: [
                  { role: 'system', content: 'You are an AI bot.' },
                  { role: 'user', content: 'Enter your prompt here' }
              ],
              stream: true
          });

          for await (const chunk of stream) {
              if (chunk.choices[0]?.delta?.content) {
                  process.stdout.write(chunk.choices[0].delta.content);
              }
          }
      } catch (error) {
          console.error('An error occurred:', error.message);
      }
  }

  main();
  ```
</CodeGroup>

### Automatic Metadata Injection

In addition to sending metadata explicitly via the `X-TFY-METADATA` header, TrueFoundry can automatically inject metadata into requests based on your access configuration. This ensures consistent tagging without requiring changes to client code.

<AccordionGroup>
  <Accordion title="Virtual account tags">
    When you assign **tags** to a virtual account, those tags are automatically added as metadata to every request authenticated with that account's token. This lets you tag all traffic from a specific application or service without modifying any client code.

    For example, if a virtual account has tags `{"application": "booking-bot", "environment": "prod"}`, every request made with that account's token will carry those metadata keys — enabling filtering, cost attribution, and routing based on those values.
  </Accordion>

  <Accordion title="Team tags via PATs">
    When a **Personal Access Token (PAT)** is associated with a team, and that team has tags configured, those team tags are automatically injected as metadata into every request made with that PAT. This provides automatic team-level attribution for all API usage.

    For example, if a PAT is linked to the "ML Platform" team which has tags `{"cost_center": "eng-ml", "department": "engineering"}`, every request through that PAT will include those metadata keys. Admins can [mandate team selection for PATs](/docs/generating-truefoundry-api-keys#admin-controls) to ensure every PAT is associated with a team.
  </Accordion>
</AccordionGroup>

<Tip>
  Combine automatic metadata injection with the [Metadata Validation](/docs/ai-gateway/metadata-validation) guardrail to enforce that every request carries the metadata your organization requires — whether injected automatically or sent by the client.
</Tip>

### Use Cases

<AccordionGroup>
  <Accordion title="Filter request logs and build custom metrics dashboards grouped by metadata keys">
    Tag every request with attributes like `application`, `environment`, or `customer_id` and slice your observability data by them.

    **Filter Logs**

    Filter your request logs using one or more metadata keys to isolate specific requests. This is useful for debugging or analyzing usage patterns for a particular feature, environment, or user.

    <Frame caption="Filtering requests using custom Metadata Keys">
      <img src="https://mintcdn.com/truefoundry/FeKcq2n1MMm83Par/images/Screenshot2025-07-21at6.57.29PM-min.png?fit=max&auto=format&n=FeKcq2n1MMm83Par&q=85&s=0992cf1c5ae5c79d1cecc4c7f15fe315" alt="TrueFoundry dashboard showing how to filter requests using custom metadata keys" width="3588" height="2004" data-path="images/Screenshot2025-07-21at6.57.29PM-min.png" />
    </Frame>

    **Create Custom Metrics**

    Group metrics by metadata keys to create custom visualizations. For example, monitor cost and usage per customer by grouping with a `customer_id` key.

    <Frame caption="Grouping metrics by a custom 'customer_id' metadata key">
      <img src="https://mintcdn.com/truefoundry/FeKcq2n1MMm83Par/images/Screenshot2025-07-21at6.29.09PM-min.png?fit=max&auto=format&n=FeKcq2n1MMm83Par&q=85&s=b45ab05b7fe4eaba48dbc662516b01de" alt="Dashboard view showing metrics grouped by customer_id metadata key with usage statistics per customer" width="3600" height="2004" data-path="images/Screenshot2025-07-21at6.29.09PM-min.png" />
    </Frame>
  </Accordion>

  <Accordion title="Selectively apply rate limits, fallbacks, and load balancing per metadata value">
    Use metadata in the `when` block of gateway configurations to target specific applications, environments, or customers — without changing client code. For example, to rate limit requests from the `dev` environment:

    ```yaml lines theme={"dark"}
    name: ratelimiting-config
    type: gateway-rate-limiting-config
    rules:
      - id: 'openai-gpt4-dev-env'
        when:
          models: ['openai-main/gpt4']
          metadata:
            env: dev
        limit_to: 1000
        unit: requests_per_day
    ```

    You can also use metadata to configure **Load Balancing** and **Fallbacks**. Learn more in [Virtual Models](/docs/ai-gateway/virtual-model#retries-and-fallbacks) and [Routing Config](/docs/ai-gateway/load-balancing-overview).
  </Accordion>
</AccordionGroup>

## Request Headers

| Name                    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Example                                                                   |
| ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------- |
| `Authorization`         | Your TrueFoundry API key (PAT or VAT) as a bearer token. See [Authentication](/docs/ai-gateway/authentication).                                                                                                                                                                                                                                                                                                                                                              | `Authorization: Bearer TFY_API_KEY`                                       |
| `x-tfy-metadata`        | Stringified JSON where both keys and values must be strings. Used for request routing and metrics filtering. See [Custom Metadata](#custom-metadata) above.                                                                                                                                                                                                                                                                                                                  | `x-tfy-metadata: {"custom_field":"value"}`                                |
| `x-tfy-provider-name`   | Routes requests to the correct provider account. Required for [Responses API](/docs/ai-gateway/responses-api), [File API](/docs/ai-gateway/file-endpoints), and [Batch API](/docs/ai-gateway/batch-predictions-with-truefoundry-llm-gateway).                                                                                                                                                                                                                                | `x-tfy-provider-name: openai`                                             |
| `x-tfy-strict-openai`   | Boolean flag to enable strict OpenAI compatibility. Set to `false` to access Claude thinking/reasoning tokens — see [Reasoning Models](/docs/ai-gateway/reasoning-models-claude-only).                                                                                                                                                                                                                                                                                       | `x-tfy-strict-openai: true`                                               |
| `x-tfy-request-timeout` | Number in milliseconds specifying the maximum time to wait for a response from a single model. If fallbacks or retries are configured, the timeout is applied per model request (i.e., each attempt, including fallbacks, will have its own timeout). See [Virtual Models — Retries and fallbacks](/docs/ai-gateway/virtual-model#retries-and-fallbacks).                                                                                                                    | `x-tfy-request-timeout: 60000`                                            |
| `x-tfy-ttft-timeout-ms` | Number in milliseconds specifying the maximum time to wait for the first token in a streaming response (time-to-first-token). If no token is received within this window, the request is considered timed out and the gateway returns **408**. For [virtual models](/docs/ai-gateway/virtual-model) or [routing config](/docs/ai-gateway/load-balancing-overview), the gateway falls back to the next model on 408 even if 408 is not included in the fallback status codes. | `x-tfy-ttft-timeout-ms: 30000`                                            |
| `x-tfy-logging-config`  | Enable or disable logging for a specific request. See [Request Logging](/docs/ai-gateway/request-logging) for details.                                                                                                                                                                                                                                                                                                                                                       | `x-tfy-logging-config: {"enabled": true}`                                 |
| `x-tfy-mcp-headers`     | Stringified JSON to pass custom headers to MCP servers. Format varies by API — see [MCP Gateway](/docs/ai-gateway/mcp/mcp-gateway-auth-security#overriding-auth-with-x-tfy-mcp-headers) and [Agent API](/docs/ai-gateway/agents/use-mcp-server-in-code-agent#method-2:-using-x-tfy-mcp-headers-header-registered-servers-only) docs. Agent API only supports registered servers.                                                                                             | `x-tfy-mcp-headers: {"truefoundry:...":{"Authorization":"Bearer TOKEN"}}` |

## Response Headers

| Name                           | Description                                                                                                                                                                                                     |
| ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `x-tfy-resolved-model`         | The final TrueFoundry model ID used to process the request (may differ from the requested model due to load balancing or fallbacks). See [Virtual Models](/docs/ai-gateway/virtual-model).                      |
| `x-tfy-applied-configurations` | Dictionary of applied configurations including load balancing, fallback, model config, applied guardrails, and rate limiting                                                                                    |
| `server-timing`                | For non-streaming requests only. Contains timing information for different processing stages including middlewares, guardrails, and model calls. See [Server-Timing Breakdown](#server-timing-breakdown) below. |

### Server-Timing Breakdown

When inspecting network requests in your browser's developer tools, the `server-timing` header provides a detailed performance breakdown:

| Processing Stage  | Duration    | Description                                         |
| ----------------- | ----------- | --------------------------------------------------- |
| Authentication    | 0.9 ms      | Authenticating User                                 |
| Input guardrails  | 0.7 ms      | Input validation and content filtering              |
| Model call        | 1350 ms     | AI model response generation (bulk of the time)     |
| Output guardrails | 722.3 ms    | Output validation and filtering                     |
| Logging           | 1.1 ms      | Logging request                                     |
| **Total**         | **2080 ms** | **Complete request processing time (2.08 seconds)** |

Metrics like `load balancing (0 ms)`, `rate limiting (0 ms)`, and `cost budget (0 ms)` show zero duration because these configs weren't triggered for this particular request.

<Frame caption="Server-timing header in browser developer tools">
  <img src="https://mintcdn.com/truefoundry/QPyNP8qMPAoy2O9d/images/gateway-server-timing.png?fit=max&auto=format&n=QPyNP8qMPAoy2O9d&q=85&s=676cbae17c2be4809a7f6e5cfc915b2b" alt="Server-timing header in browser developer tools showing request processing breakdown" width="3024" height="1588" data-path="images/gateway-server-timing.png" />
</Frame>
