Skip to main content

Distribution queries

Aggregated snapshots of model metrics over a time window. Every example below posts JSON to:
POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
with Authorization: Bearer <your_api_key> and Content-Type: application/json. To keep the snippets short, only the JSON body is shown; the request wrapper is identical to the one in the Overview Quick Start.
By default, model metrics include both models and virtual models. The examples below pin the model side with {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}. To target virtual models, flip the value to false and swap groupBy: ["modelName"] for groupBy: ["virtualModel"] (the alias used in groupBy/aggregations).
Request counts grouped by model:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "count", "column": "modelName"}
    ],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}
Total input and output tokens per model:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "inputTokens"},
        {"type": "sum", "column": "outputTokens"}
    ],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}
p50, p90, and p99 latency grouped by model:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "p50", "column": "latencyMs"},
        {"type": "p90", "column": "latencyMs"},
        {"type": "p99", "column": "latencyMs"}
    ],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}
Group by model and a custom metadata key:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName", "metadata.environment"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}
Group by multiple dimensions (model + subject):
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName", "userEmail"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}
    ]
}
Requests slower than 1 second, grouped by model:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true},
        {"fieldName": "latencyMs", "operator": "GREATER_THAN", "value": 1000}
    ]
}
Requests within a latency band:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true},
        {"fieldName": "latencyMs", "operator": "BETWEEN", "value": [500, 5000]}
    ]
}
Combine input and output token thresholds:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true},
        {"fieldName": "inputTokens", "operator": "GREATER_THAN", "value": 100},
        {"fieldName": "outputTokens", "operator": "LESS_THAN_EQUAL", "value": 1000}
    ]
}
Filter to specific teams using array operators:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["team", "modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true},
        {"fieldName": "team", "operator": "ARRAY_HAS_ANY", "value": ["team-alpha", "team-beta"]}
    ]
}
Restrict by a custom metadata value:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true},
        {"metadataKey": "environment", "operator": "IN", "value": ["production"]}
    ]
}
Only requests routed through a virtual model. Note virtualModel in groupBy/aggregations but virtualModelName in filters:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "inputTokens"},
        {"type": "sum", "column": "outputTokens"}
    ],
    "groupBy": ["virtualModel"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": false}
    ]
}
Combine multiple filter types:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "modelMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["modelName"],
    "filters": [
        {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true},
        {"fieldName": "modelName", "operator": "IN", "value": ["gpt-4", "gpt-3.5-turbo"]},
        {"fieldName": "latencyMs", "operator": "BETWEEN", "value": [100, 10000]},
        {"fieldName": "inputTokens", "operator": "GREATER_THAN", "value": 50},
        {"fieldName": "outputTokens", "operator": "LESS_THAN", "value": 2000}
    ]
}