Skip to main content

Distribution queries

Aggregated snapshots of routing rule applications over a time window. Every example below posts JSON to:
POST https://{your_control_plane_url}/api/svc/v1/llm-gateway/metrics/query
with Authorization: Bearer <your_api_key> and Content-Type: application/json. To keep the snippets short, only the JSON body is shown; the wrapper is identical to the Overview Quick Start.
Rely on the implicit total count; group by configType and status to see the breakdown of rule applications:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "groupBy": ["configType", "status"]
}
Min, average, and p99 of attempts per loadbalance rule, useful when tuning rule fan-out:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "min", "column": "loadbalanceTargetAttemptCount"},
        {"type": "avg", "column": "loadbalanceTargetAttemptCount"},
        {"type": "p99", "column": "loadbalanceTargetAttemptCount"}
    ],
    "groupBy": ["loadbalanceRuleId"]
}
Fix requestedModel and see which targetModels the request lands on, with outcome:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "loadbalanceTargetAttemptCount"}
    ],
    "groupBy": ["targetModel", "status"],
    "filters": [
        {"fieldName": "requestedModel", "operator": "IN", "value": ["gpt-4"]}
    ]
}
Filter to a single ratelimitRuleId and group by status to see allowed vs blocked counts for that rule:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "groupBy": ["status"],
    "filters": [
        {"fieldName": "ratelimitRuleId", "operator": "IN", "value": ["<rule-id>"]}
    ]
}
Restrict to failed outcomes, broken down by rule:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "aggregations": [],
    "groupBy": ["loadbalanceRuleId", "configType"],
    "filters": [
        {"fieldName": "loadbalanceRuleId", "operator": "NOT_IN", "value": [""]}
    ]
}
How many distinct target models a rule is fanning into:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "countDistinct", "column": "loadbalanceTargetAttemptCount"}
    ],
    "groupBy": ["loadbalanceRuleId"]
}
Routing volume per team:
json={
    "startTs": "2026-04-21T00:00:00.000Z",
    "endTs": "2026-04-22T00:00:00.000Z",
    "datasource": "configMetrics",
    "type": "distribution",
    "aggregations": [
        {"type": "sum", "column": "loadbalanceTargetAttemptCount"}
    ],
    "groupBy": ["team", "configType"],
    "filters": [
        {"fieldName": "team", "operator": "ARRAY_HAS_ANY", "value": ["team-alpha"]}
    ]
}