API Access to Cache Metrics

The Gateway Cache Metrics Query API provides a flexible way to query cache-eligible requests: cache lookups, hits, savings, and the model fields that go with them. Internally this is the same underlying table as modelMetrics, restricted to rows where CacheLookupStatus is set. You can retrieve either distribution (aggregated) or timeseries results with powerful filtering and grouping.

This page covers datasource: "cacheMetrics". For other datasources, see the sibling pages for Model, MCP, Guardrail, Routing, and Agent metrics.

Access control

Tenant admins: Can query metrics for the entire organization (tenant-wide).
Users: Can query their own data and their teams’ data.
Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.

The server applies RBAC automatically; callers don’t pass any RBAC fields.

Section	Description
Overview	Authentication, quick start, and API reference
Filtering	Filter operators, fields, and combinations
Distribution examples	Aggregated (distribution) query examples
Timeseries examples	Time-bucketed (timeseries) query examples
Response format	Response JSON structure and error responses

Authentication

You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).

Get your API key

To generate an API key:

Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)

For detailed authentication setup, see our Authentication guide.

Quick Start

By default, cache metrics include both models and virtual models. To restrict to one, use {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true} for model-only metrics, or value: false for virtual-model-only metrics.

The server automatically adds WHERE "CacheLookupStatus" IS NOT NULL to every cache query; you do not (and should not) add it yourself. Because cache shares its underlying table with modelMetrics, every model field is reachable in addition to the cache-specific ones below.

The virtual-model column has two aliases. In groupBy and aggregations[].column use virtualModel. In filters[].fieldName and in response keys, the name is virtualModelName. They refer to the same underlying database column.

Distribution query

Cost savings and tokens read from cache, grouped by cache type and namespace:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "cacheMetrics",
        "type": "distribution",
        "aggregations": [
            {"type": "sum", "column": "potentialCostSavings"},
            {"type": "sum", "column": "cacheReadInputTokens"},
            {"type": "p50", "column": "cacheLookupLatencyMs"}
        ],
        "groupBy": ["cacheType", "cacheNamespace"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())

Timeseries query

Hourly cache savings and p99 lookup latency, grouped by cache type:

import requests

response = requests.post(
    "https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query",
    headers={
        "Authorization": "Bearer <your_api_key>",
        "Content-Type": "application/json"
    },
    json={
        "startTs": "2026-04-21T00:00:00.000Z",
        "endTs": "2026-04-22T00:00:00.000Z",
        "datasource": "cacheMetrics",
        "type": "timeseries",
        "interval": "1 hour",
        "aggregations": [
            {"type": "sum", "column": "potentialCostSavings"},
            {"type": "p99", "column": "cacheLookupLatencyMs"}
        ],
        "groupBy": ["cacheType"],
        "filters": [
            {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
        ]
    }
)

print(response.json())

API reference

Endpoint

POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query

Post JSON to this endpoint with Authorization: Bearer <your_api_key> and Content-Type: application/json.

Request parameters

string

required

ISO 8601 timestamp marking the inclusive lower bound of the query window.

string

required

ISO 8601 timestamp marking the exclusive upper bound of the query window.

string

required

The data source to query. Use "cacheMetrics" for Gateway cache metrics.

string

required

The type of query to execute:

"distribution": returns aggregated rows (one row per groupBy combination).
"timeseries": returns time-bucketed rows. Requires interval.

array

Array of { type, column } objects describing the aggregations to compute. When omitted, only the implicit total = COUNT(*) is returned.Supported aggregation types

Type	Description
`sum`	Sum of values
`count`	Non-null count of the column
`countDistinct`	Distinct count
`min`	Minimum value
`max`	Maximum value
`avg`	Average
`p5`, `p10`, `p25`, `p50`, `p75`, `p90`, `p95`, `p99`, `p999`	Percentiles (approximate)
`rateSum`, `rateAvg`, `rateMin`, `rateMax`	Rates normalised by the interval in seconds (timeseries only)
`ratePerMinute`	Value divided by the interval in minutes (timeseries only)

Supported aggregation columns

Column	Notes
`cacheLookupLatencyMs`	The cache-lookup latency itself
`potentialCostSavings`	Cost saved by cache hits (USD)
`cacheCreationInputTokens`	Tokens written into the cache
`cacheReadInputTokens`	Tokens read from the cache
`costInUSD`	Cost incurred (USD)
`inputTokens`	Number of input tokens
`outputTokens`	Number of output tokens
`latencyMs`	Total request latency (ms)
`timeToFirstTokenMs`	Time to the first generated token (ms)
`interTokenLatencyMs`	Latency between consecutive generated tokens (ms)
`timePerOutputTokenLatencyMs`	Latency per output token (ms)

All scalar and percentile aggregation types apply to every column above.

array

Array of field names to group results by. Custom metadata keys are supported with a metadata. prefix.Available group-by fields

Field	Notes
`cacheType`	e.g. `semantic`, `simple`
`cacheNamespace`	Logical bucket within a cache type
`modelName`	The underlying model name
`virtualModel`	The virtual-model name
`requestType`	Type of request, e.g. `ChatCompletion`, `Embedding`
`providerModelName`	Underlying provider model name
`providerAccountType`	Account type of the provider
`errorCode`	HTTP error code returned, when applicable
`userEmail`	Group by user (response key: `createdBySubjectSlug`)
`virtualaccount`	Group by virtual account (response key: `createdBySubjectSlug`)
`team`	Unnests the `Teams` array
`createdBySubjectType`	Distinguishes `user` vs `virtualaccount`
`metadata.<key>`	Group by a custom metadata key

When groupBy contains userEmail (without virtualaccount), the server auto-injects WHERE CreatedBySubjectType = 'user'. virtualaccount alone auto-injects 'virtualaccount'. When both appear, scope it yourself with createdBySubjectType if needed.

array

Array of filter objects, AND-combined. See Filtering for the full operator reference and the per-field allow-list.

string

Required for timeseries queries. Bucket size as <positive integer> <unit>, where <unit> is one of second, minute, hour, day, week, month, year (with or without a trailing s). Examples: "30 second", "5 minute", "1 hour", "1 day". Compound expressions like "1 hour 30 minute" are rejected.

number

deprecated

Deprecated alias for interval. Accepts a positive integer number of seconds. Prefer interval in new code. If both are provided, interval wins.

Get Started

LLM Gateway

MCP Registry and Gateway

Skills Registry

Prompt Registry

Guardrails and Security

Observability

Deployment

Admin Guide

Chat

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Access control

Contents

Authentication

Quick Start

Distribution query

Timeseries query

API reference

Endpoint

Request parameters

Query Examples

​Access control

​Contents

​Authentication

​Quick Start

​Distribution query

​Timeseries query

​API reference

​Endpoint

​Request parameters

Query Examples

Access control

Contents

Authentication

Quick Start

Distribution query

Timeseries query

API reference

Endpoint

Request parameters