modelMetrics, restricted to rows where CacheLookupStatus is set. You can retrieve either distribution (aggregated) or timeseries results with powerful filtering and grouping.
Access control
- Tenant admins: Can query metrics for the entire organization (tenant-wide).
- Users: Can query their own data and their teams’ data.
- Virtual accounts: Can query their own data and their teams’ data; with tenant-admin permissions, they can access tenant-wide data.
Contents
| Section | Description |
|---|---|
| Overview | Authentication, quick start, and API reference |
| Filtering | Filter operators, fields, and combinations |
| Distribution examples | Aggregated (distribution) query examples |
| Timeseries examples | Time-bucketed (timeseries) query examples |
| Response format | Response JSON structure and error responses |
Authentication
You need to authenticate with your TrueFoundry API key. You can use either a Personal Access Token (PAT) or Virtual Account Token (VAT).Get your API key
Get your API key
To generate an API key:
- Personal Access Token (PAT): Go to Access → Personal Access Tokens in your TrueFoundry dashboard
- Virtual Account Token (VAT): Go to Access → Virtual Account Tokens (requires admin permissions)
Quick Start
The server automatically adds
WHERE "CacheLookupStatus" IS NOT NULL to every cache query; you do not (and should not) add it yourself. Because cache shares its underlying table with modelMetrics, every model field is reachable in addition to the cache-specific ones below.The virtual-model column has two aliases. In
groupBy and aggregations[].column use virtualModel. In filters[].fieldName and in response keys, the name is virtualModelName. They refer to the same underlying database column.Distribution query
Cost savings and tokens read from cache, grouped by cache type and namespace:Timeseries query
Hourly cache savings and p99 lookup latency, grouped by cache type:API reference
Endpoint
Authorization: Bearer <your_api_key> and Content-Type: application/json.
Request parameters
ISO 8601 timestamp marking the inclusive lower bound of the query window.
ISO 8601 timestamp marking the exclusive upper bound of the query window.
The data source to query. Use
"cacheMetrics" for Gateway cache metrics.The type of query to execute:
"distribution": returns aggregated rows (one row pergroupBycombination)."timeseries": returns time-bucketed rows. Requiresinterval.
Array of
Supported aggregation columns
All scalar and percentile aggregation types apply to every column above.
{ type, column } objects describing the aggregations to compute. When omitted, only the implicit total = COUNT(*) is returned.Supported aggregation types| Type | Description |
|---|---|
sum | Sum of values |
count | Non-null count of the column |
countDistinct | Distinct count |
min | Minimum value |
max | Maximum value |
avg | Average |
p5, p10, p25, p50, p75, p90, p95, p99, p999 | Percentiles (approximate) |
rateSum, rateAvg, rateMin, rateMax | Rates normalised by the interval in seconds (timeseries only) |
ratePerMinute | Value divided by the interval in minutes (timeseries only) |
| Column | Notes |
|---|---|
cacheLookupLatencyMs | The cache-lookup latency itself |
potentialCostSavings | Cost saved by cache hits (USD) |
cacheCreationInputTokens | Tokens written into the cache |
cacheReadInputTokens | Tokens read from the cache |
costInUSD | Cost incurred (USD) |
inputTokens | Number of input tokens |
outputTokens | Number of output tokens |
latencyMs | Total request latency (ms) |
timeToFirstTokenMs | Time to the first generated token (ms) |
interTokenLatencyMs | Latency between consecutive generated tokens (ms) |
timePerOutputTokenLatencyMs | Latency per output token (ms) |
Array of field names to group results by. Custom metadata keys are supported with a
metadata. prefix.Available group-by fields| Field | Notes |
|---|---|
cacheType | e.g. semantic, simple |
cacheNamespace | Logical bucket within a cache type |
modelName | The underlying model name |
virtualModel | The virtual-model name |
requestType | Type of request, e.g. ChatCompletion, Embedding |
providerModelName | Underlying provider model name |
providerAccountType | Account type of the provider |
errorCode | HTTP error code returned, when applicable |
userEmail | Group by user (response key: createdBySubjectSlug) |
virtualaccount | Group by virtual account (response key: createdBySubjectSlug) |
team | Unnests the Teams array |
createdBySubjectType | Distinguishes user vs virtualaccount |
metadata.<key> | Group by a custom metadata key |
When
groupBy contains userEmail (without virtualaccount), the server auto-injects WHERE CreatedBySubjectType = 'user'. virtualaccount alone auto-injects 'virtualaccount'. When both appear, scope it yourself with createdBySubjectType if needed.Array of filter objects, AND-combined. See Filtering for the full operator reference and the per-field allow-list.
Required for timeseries queries. Bucket size as
<positive integer> <unit>, where <unit> is one of second, minute, hour, day, week, month, year (with or without a trailing s). Examples: "30 second", "5 minute", "1 hour", "1 day". Compound expressions like "1 hour 30 minute" are rejected.Deprecated alias for
interval. Accepts a positive integer number of seconds. Prefer interval in new code. If both are provided, interval wins.