Distribution queries
Aggregated snapshots of cache metrics over a time window. Every example below posts JSON to:
POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
with Authorization: Bearer <your_api_key> and Content-Type: application/json. To keep the snippets short, only the JSON body is shown; the wrapper is identical to the Overview Quick Start .
The examples below pin the model side with {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}. Flip to false (and swap groupBy: ["modelName"] for groupBy: ["virtualModel"]) for virtual-model-only cache stats.
Top namespaces by tokens served from cache
Sum of cacheReadInputTokens per namespace. Surfaces which buckets do the most work: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "cacheReadInputTokens" }
],
"groupBy" : [ "cacheNamespace" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Lookup latency percentiles by cache type
p50, p90, and p99 lookup latency grouped by cache type: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "p50" , "column" : "cacheLookupLatencyMs" },
{ "type" : "p90" , "column" : "cacheLookupLatencyMs" },
{ "type" : "p99" , "column" : "cacheLookupLatencyMs" }
],
"groupBy" : [ "cacheType" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Sum of cost savings per underlying model: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "potentialCostSavings" }
],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Restrict to a specific cache type and break savings down by model: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "potentialCostSavings" },
{ "type" : "sum" , "column" : "cacheReadInputTokens" }
],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "cacheType" , "operator" : "IN" , "value" : [ "semantic" ]},
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Filter namespaces by prefix
Use STRING_STARTS_WITH on cacheNamespace, handy when prod and staging share a cache type: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "potentialCostSavings" }
],
"groupBy" : [ "cacheNamespace" ],
"filters" : [
{ "fieldName" : "cacheNamespace" , "operator" : "STRING_STARTS_WITH" , "value" : "prod-" },
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Group by cacheLookupStatus to see hits vs misses per cache type: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "cacheType" , "cacheLookupStatus" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Compare cache-creation tokens to cache-read tokens per namespace: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "cacheMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "cacheCreationInputTokens" },
{ "type" : "sum" , "column" : "cacheReadInputTokens" }
],
"groupBy" : [ "cacheNamespace" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}