Skip to main content
The Gateway Cache Metrics Query API provides a flexible way to query cache-eligible requests: cache lookups, hits, savings, and the model fields that go with them. Internally this is the same underlying table as modelMetrics, restricted to rows where CacheLookupStatus is set. You can retrieve either distribution (aggregated) or timeseries results with powerful filtering and grouping.
This page covers datasource: "cacheMetrics". For other datasources, see the sibling pages for Model, MCP, Guardrail, Routing, and Agent metrics.

Access control

  • Tenant admins: Can query metrics for the entire organization (tenant-wide).
  • Users: Can query their own data and their teams’ data.
  • Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.
The server applies RBAC automatically; callers don’t pass any RBAC fields.

Contents

SectionDescription
OverviewAuthentication, quick start, and API reference
FilteringFilter operators, fields, and combinations
Distribution examplesAggregated (distribution) query examples
Timeseries examplesTime-bucketed (timeseries) query examples
Response formatResponse JSON structure and error responses

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).
To generate an API key:
  1. Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
  2. Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)
For detailed authentication setup, see our Authentication guide.

Quick Start

By default, cache metrics include both models and virtual models. To restrict to one, use {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true} for model-only metrics, or value: false for virtual-model-only metrics.
The server automatically adds WHERE "CacheLookupStatus" IS NOT NULL to every cache query; you do not (and should not) add it yourself. Because cache shares its underlying table with modelMetrics, every model field is reachable in addition to the cache-specific ones below.
The virtual-model column has two aliases. In groupBy and aggregations[].column use virtualModel. In filters[].fieldName and in response keys, the name is virtualModelName. They refer to the same underlying database column.

Distribution query

Cost savings and tokens read from cache, grouped by cache type and namespace:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "cacheMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "sum", "column": "potentialCostSavings"},
            {"type": "sum", "column": "cacheReadInputTokens"},
            {"type": "p50", "column": "cacheLookupLatencyMs"}
        ],
        "groupBy": ["cacheType", "cacheNamespace"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())

Timeseries query

Hourly cache savings and p99 lookup latency, grouped by cache type:
import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "cacheMetrics",
        "type": "timeseries",
        "interval": "1 hour",
        "aggregations": [
            {"type": "sum", "column": "potentialCostSavings"},
            {"type": "p99", "column": "cacheLookupLatencyMs"}
        ],
        "groupBy": ["cacheType"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())

API reference

Endpoint

POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
Post JSON to this endpoint with Authorization: Bearer <your_api_key> and Content-Type: application/json.

Request parameters

startTs
string
required
ISO 8601 timestamp marking the inclusive lower bound of the query window.
endTs
string
required
ISO 8601 timestamp marking the exclusive upper bound of the query window.
datasource
string
required
The data source to query. Use "cacheMetrics" for Gateway cache metrics.
type
string
required
The type of query to execute:
  • "distribution": returns aggregated rows (one row per groupBy combination).
  • "timeseries": returns time-bucketed rows. Requires interval.
aggregations
array
Array of { type, column } objects describing the aggregations to compute. When omitted, only the implicit total = COUNT(*) is returned.Supported aggregation types
TypeDescription
sumSum of values
countNon-null count of the column
countDistinctDistinct count
minMinimum value
maxMaximum value
avgAverage
p5, p10, p25, p50, p75, p90, p95, p99, p999Percentiles (approximate)
rateSum, rateAvg, rateMin, rateMaxRates normalised by the interval in seconds (timeseries only)
ratePerMinuteValue divided by the interval in minutes (timeseries only)
Supported aggregation columns
ColumnNotes
cacheLookupLatencyMsThe cache-lookup latency itself
potentialCostSavingsCost saved by cache hits (USD)
cacheCreationInputTokensTokens written into the cache
cacheReadInputTokensTokens read from the cache
costInUSDCost incurred (USD)
inputTokensNumber of input tokens
outputTokensNumber of output tokens
latencyMsTotal request latency (ms)
timeToFirstTokenMsTime to the first generated token (ms)
interTokenLatencyMsLatency between consecutive generated tokens (ms)
timePerOutputTokenLatencyMsLatency per output token (ms)
All scalar and percentile aggregation types apply to every column above.
groupBy
array
Array of field names to group results by. Custom metadata keys are supported with a metadata. prefix.Available group-by fields
FieldNotes
cacheTypee.g. semantic, simple
cacheNamespaceLogical bucket within a cache type
modelNameThe underlying model name
virtualModelThe virtual-model name
requestTypeType of request, e.g. ChatCompletion, Embedding
providerModelNameUnderlying provider model name
providerAccountTypeAccount type of the provider
errorCodeHTTP error code returned, when applicable
userEmailGroup by user (response key: createdBySubjectSlug)
virtualaccountGroup by virtual account (response key: createdBySubjectSlug)
teamUnnests the Teams array
createdBySubjectTypeDistinguishes user vs virtualaccount
metadata.<key>Group by a custom metadata key
When groupBy contains userEmail (without virtualaccount), the server auto-injects WHERE CreatedBySubjectType = 'user'. virtualaccount alone auto-injects 'virtualaccount'. When both appear, scope it yourself with createdBySubjectType if needed.
filters
array
Array of filter objects, AND-combined. See Filtering for the full operator reference and the per-field allow-list.
interval
string
Required for timeseries queries. Bucket size as <positive integer> <unit>, where <unit> is one of second, minute, hour, day, week, month, year (with or without a trailing s). Examples: "30 second", "5 minute", "1 hour", "1 day". Compound expressions like "1 hour 30 minute" are rejected.
intervalInSeconds
number
deprecated
Deprecated alias for interval. Accepts a positive integer number of seconds. Prefer interval in new code. If both are provided, interval wins.