Cache Metrics: Distribution Examples

Top namespaces by tokens served from cache

Sum of cacheReadInputTokens per namespace. Surfaces which buckets do the most work:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "cacheReadInputTokens"}
    ],
    "groupBy": ["cacheNamespace"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Lookup latency percentiles by cache type

p50, p90, and p99 lookup latency grouped by cache type:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "p50", "column": "cacheLookupLatencyMs"},
        {"type": "p90", "column": "cacheLookupLatencyMs"},
        {"type": "p99", "column": "cacheLookupLatencyMs"}
    ],
    "groupBy": ["cacheType"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Savings by model

Sum of cost savings per underlying model:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "potentialCostSavings"}
    ],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Semantic cache only

Restrict to a specific cache type and break savings down by model:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "potentialCostSavings"},
        {"type": "sum", "column": "cacheReadInputTokens"}
    ],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "cacheType", "operator": "IN", "value": ["semantic"]},
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Filter namespaces by prefix

Use STRING_STARTS_WITH on cacheNamespace, handy when prod and staging share a cache type:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "potentialCostSavings"}
    ],
    "groupBy": ["cacheNamespace"],
    "filters": [
        {"fieldName": "cacheNamespace", "operator": "STRING_STARTS_WITH", "value": "prod-"},
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Hit vs miss breakdown

Group by cacheLookupStatus to see hits vs misses per cache type:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["cacheType", "cacheLookupStatus"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Tokens written vs read

Compare cache-creation tokens to cache-read tokens per namespace:

json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "cacheMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "cacheCreationInputTokens"},
        {"type": "sum", "column": "cacheReadInputTokens"}
    ],
    "groupBy": ["cacheNamespace"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}

Get Started

LLM Gateway

MCP Registry and Gateway

Skills Registry

Prompt Registry

Guardrails and Security

Observability

Deployment

Admin Guide

Chat

Messages

Embeddings

Rerank

Responses

Image

Audio

Batch

Files

Fine-tuning

Moderations

Models

Cache Metrics: Distribution Examples

Distribution queries

​Distribution queries

Distribution queries