> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# API Access to Model Metrics

> Query Gateway model metrics for usage, cost, and performance analytics via API.

The **Gateway Model Metrics Query API** provides a flexible way to query Gateway model and virtual-model metrics for usage, performance, cost, and user activity. You can retrieve either **distribution** (aggregated) or **timeseries** results with powerful filtering and grouping.

<Info>
  This page covers `datasource: "modelMetrics"`. For other datasources, see the sibling pages for [MCP](/docs/ai-gateway/fetch-mcp-metrics), [Guardrail](/docs/ai-gateway/fetch-guardrail-metrics), [Cache](/docs/ai-gateway/fetch-cache-metrics), [Routing](/docs/ai-gateway/fetch-routing-metrics), and [Agent](/docs/ai-gateway/fetch-agent-metrics) metrics.
</Info>

### Access control

* **Tenant admins:** Can query metrics for the entire organization (tenant-wide).
* **Users:** Can query their own data and their teams' data.
* **Virtual accounts:** Can query their own data and their teams' data; with tenant-admin permissions, they can access tenant-wide data.

The server applies RBAC automatically; callers don't pass any RBAC fields.

## Contents

| Section                                                                             | Description                                    |
| ----------------------------------------------------------------------------------- | ---------------------------------------------- |
| [Overview](/docs/ai-gateway/fetch-model-metrics)                                    | Authentication, quick start, and API reference |
| [Filtering](/docs/ai-gateway/fetch-model-metrics-filtering)                         | Filter operators, fields, and combinations     |
| [Distribution examples](/docs/ai-gateway/fetch-model-metrics-examples-distribution) | Aggregated (distribution) query examples       |
| [Timeseries examples](/docs/ai-gateway/fetch-model-metrics-examples-timeseries)     | Time-bucketed (timeseries) query examples      |
| [Response format](/docs/ai-gateway/fetch-model-metrics-response)                    | Response JSON structure and error responses    |

## Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token **(PAT)** or Virtual Account Token **(VAT)**.

<Accordion title="Get your API key">
  To generate an API key:

  1. **Personal Access Token (PAT)**: Go to Access → Personal Access Tokens in your TrueFoundry dashboard
  2. **Virtual Account Token (VAT)**: Go to Access → Virtual Account Tokens (requires admin permissions)

  For detailed authentication setup, see our [Authentication guide](/docs/ai-gateway/authentication).
</Accordion>

## Quick Start

<Warning>
  By default, the API returns metrics for **both models and virtual models**. To restrict to one, add `{"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}` for model-only metrics, or `value: false` for virtual-model-only metrics.
</Warning>

<Note>
  The virtual-model column has two aliases. In `groupBy` and `aggregations[].column` use `virtualModel`. In `filters[].fieldName` and in response keys, the name is `virtualModelName`. They refer to the same underlying database column.
</Note>

### Distribution query

Aggregated model metrics including request counts, token totals, p99 latency, and cost grouped by model:

```python theme={"dark"}
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "sum", "column": "outputTokens"},
            {"type": "p99", "column": "latencyMs"},
            {"type": "sum", "column": "costInUSD"}
        ],
        "groupBy": ["modelName"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())
```

### Timeseries query

The same shape bucketed hourly:

```python theme={"dark"}
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "modelMetrics",
        "type": "timeseries",
        "interval": "1 hour",
        "aggregations": [
            {"type": "count", "column": "modelName"},
            {"type": "sum", "column": "inputTokens"},
            {"type": "p99", "column": "latencyMs"}
        ],
        "groupBy": ["modelName"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())
```

## API reference

### Endpoint

```
POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
```

Post JSON to this endpoint with `Authorization: Bearer <your_api_key>` and `Content-Type: application/json`.

### Request parameters

<ParamField path="startTs" type="string" required>
  ISO 8601 timestamp marking the **inclusive** lower bound of the query window (e.g. `"2026-04-21T00:00:00.000Z"`).
</ParamField>

<ParamField path="endTs" type="string" required>
  ISO 8601 timestamp marking the **exclusive** upper bound of the query window (e.g. `"2026-04-22T00:00:00.000Z"`).
</ParamField>

<ParamField path="datasource" type="string" required>
  The data source to query. Use `"modelMetrics"` for Gateway model metrics.
</ParamField>

<ParamField path="type" type="string" required>
  The type of query to execute:

  * `"distribution"`: returns aggregated rows (one row per `groupBy` combination).
  * `"timeseries"`: returns time-bucketed rows (one row per bucket per `groupBy` combination). Requires `interval`.
</ParamField>

<ParamField path="aggregations" type="array">
  Array of `{ type, column }` objects describing the aggregations to compute. When omitted, only the implicit `total = COUNT(*)` is returned.

  ```json theme={"dark"}
  "aggregations": [
      {"type": "count", "column": "modelName"},
      {"type": "sum", "column": "inputTokens"},
      {"type": "p99", "column": "latencyMs"}
  ]
  ```

  **Supported aggregation types**

  | Type                                                          | Description                                                   |
  | ------------------------------------------------------------- | ------------------------------------------------------------- |
  | `sum`                                                         | Sum of values                                                 |
  | `count`                                                       | Non-null count of the column                                  |
  | `countDistinct`                                               | Distinct count                                                |
  | `min`                                                         | Minimum value                                                 |
  | `max`                                                         | Maximum value                                                 |
  | `avg`                                                         | Average                                                       |
  | `p5`, `p10`, `p25`, `p50`, `p75`, `p90`, `p95`, `p99`, `p999` | Percentiles (approximate)                                     |
  | `rateSum`                                                     | `sum` normalised by the interval in seconds (timeseries only) |
  | `rateAvg`                                                     | `avg` normalised by the interval in seconds (timeseries only) |
  | `rateMin`                                                     | `min` normalised by the interval in seconds (timeseries only) |
  | `rateMax`                                                     | `max` normalised by the interval in seconds (timeseries only) |
  | `ratePerMinute`                                               | Value divided by the interval in minutes (timeseries only)    |

  **Supported aggregation columns**

  | Column                        | Notes                                             |
  | ----------------------------- | ------------------------------------------------- |
  | `costInUSD`                   | Cost incurred (USD)                               |
  | `inputTokens`                 | Number of input tokens                            |
  | `outputTokens`                | Number of output tokens                           |
  | `latencyMs`                   | Total request latency (ms)                        |
  | `timeToFirstTokenMs`          | Time to the first generated token (ms)            |
  | `interTokenLatencyMs`         | Latency between consecutive generated tokens (ms) |
  | `timePerOutputTokenLatencyMs` | Latency per output token (ms)                     |

  All scalar and percentile aggregation types apply to every column above.
</ParamField>

<ParamField path="groupBy" type="array">
  Array of field names to group results by. Custom metadata keys are supported with a `metadata.` prefix (e.g. `"metadata.environment"`).

  ```json theme={"dark"}
  "groupBy": ["modelName", "team", "metadata.environment"]
  ```

  **Available group-by fields**

  | Field                  | Notes                                                                         |
  | ---------------------- | ----------------------------------------------------------------------------- |
  | `modelName`            | The underlying model name                                                     |
  | `virtualModel`         | The virtual-model name (when the request was routed through one)              |
  | `requestType`          | Type of request, e.g. `ChatCompletion`, `Embedding`                           |
  | `providerModelName`    | Underlying provider model name                                                |
  | `providerAccountType`  | Account type of the provider (e.g. `model`, `mcp-server`, `guardrail-config`) |
  | `errorCode`            | HTTP error code returned, when applicable                                     |
  | `userEmail`            | Group by user (response key: `createdBySubjectSlug`)                          |
  | `virtualaccount`       | Group by virtual account (response key: `createdBySubjectSlug`)               |
  | `team`                 | Unnests the `Teams` array                                                     |
  | `createdBySubjectType` | Distinguishes `user` vs `virtualaccount`                                      |
  | `metadata.<key>`       | Group by a custom metadata key                                                |

  <Note>
    When `groupBy` contains `userEmail` (without `virtualaccount`), the server auto-injects `WHERE CreatedBySubjectType = 'user'`. `virtualaccount` alone auto-injects `'virtualaccount'`. When both appear, scope it yourself with `createdBySubjectType` if needed.
  </Note>
</ParamField>

<ParamField path="filters" type="array">
  Array of filter objects, AND-combined. See [Filtering](/docs/ai-gateway/fetch-model-metrics-filtering) for the full operator reference and the per-field allow-list.
</ParamField>

<ParamField path="interval" type="string">
  **Required for timeseries queries.** Bucket size as `<positive integer> <unit>`, where `<unit>` is one of `second`, `minute`, `hour`, `day`, `week`, `month`, `year` (with or without a trailing `s`). Examples: `"30 second"`, `"5 minute"`, `"1 hour"`, `"1 day"`. Compound expressions like `"1 hour 30 minute"` are rejected.
</ParamField>

<ParamField path="intervalInSeconds" type="number" deprecated>
  **Deprecated alias for `interval`.** Accepts a positive integer number of seconds (e.g. `3600` for hourly). Prefer `interval` in new code. If both are provided, `interval` wins.
</ParamField>
