Distribution queries
Aggregated snapshots of model metrics over a time window. Every example below posts JSON to:
POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
with Authorization: Bearer <your_api_key> and Content-Type: application/json. To keep the snippets short, only the JSON body is shown; the request wrapper is identical to the one in the Overview Quick Start .
By default, model metrics include both models and virtual models . The examples below pin the model side with {"fieldName": "virtualModelName", "operator": "IS_NULL", "value": true}. To target virtual models, flip the value to false and swap groupBy: ["modelName"] for groupBy: ["virtualModel"] (the alias used in groupBy/aggregations).
Request counts grouped by model: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "count" , "column" : "modelName" }
],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Total input and output tokens per model: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "inputTokens" },
{ "type" : "sum" , "column" : "outputTokens" }
],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Latency percentiles by model
p50, p90, and p99 latency grouped by model: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "p50" , "column" : "latencyMs" },
{ "type" : "p90" , "column" : "latencyMs" },
{ "type" : "p99" , "column" : "latencyMs" }
],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Multi-dimensional grouping
Group by multiple dimensions (model + subject): json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "modelName" , "userEmail" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true}
]
}
Filter high-latency requests
Requests slower than 1 second, grouped by model: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true},
{ "fieldName" : "latencyMs" , "operator" : "GREATER_THAN" , "value" : 1000 }
]
}
Requests within a latency band: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true},
{ "fieldName" : "latencyMs" , "operator" : "BETWEEN" , "value" : [ 500 , 5000 ]}
]
}
Combine input and output token thresholds: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true},
{ "fieldName" : "inputTokens" , "operator" : "GREATER_THAN" , "value" : 100 },
{ "fieldName" : "outputTokens" , "operator" : "LESS_THAN_EQUAL" , "value" : 1000 }
]
}
Filter to specific teams using array operators: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "team" , "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true},
{ "fieldName" : "team" , "operator" : "ARRAY_HAS_ANY" , "value" : [ "team-alpha" , "team-beta" ]}
]
}
Virtual-model metrics only
Only requests routed through a virtual model. Note virtualModel in groupBy/aggregations but virtualModelName in filters: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [
{ "type" : "sum" , "column" : "inputTokens" },
{ "type" : "sum" , "column" : "outputTokens" }
],
"groupBy" : [ "virtualModel" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : false}
]
}
Complex filter combination
Combine multiple filter types: json={
"startTs" : "2026-04-21T00:00:00.000Z" ,
"endTs" : "2026-04-22T00:00:00.000Z" ,
"datasource" : "modelMetrics" ,
"type" : "distribution" ,
"aggregations" : [],
"groupBy" : [ "modelName" ],
"filters" : [
{ "fieldName" : "virtualModelName" , "operator" : "IS_NULL" , "value" : true},
{ "fieldName" : "modelName" , "operator" : "IN" , "value" : [ "gpt-4" , "gpt-3.5-turbo" ]},
{ "fieldName" : "latencyMs" , "operator" : "BETWEEN" , "value" : [ 100 , 10000 ]},
{ "fieldName" : "inputTokens" , "operator" : "GREATER_THAN" , "value" : 50 },
{ "fieldName" : "outputTokens" , "operator" : "LESS_THAN" , "value" : 2000 }
]
}