
View By Selector
You can pivot all charts on this tab using the View by selector. This changes the grouping dimension for every chart on the page:| View by | Groups metrics by | When to use |
|---|---|---|
| Models | Model name (default) | Compare performance across different LLM models |
| Virtual Models | Virtual model / model alias | Evaluate model routing configurations |
| Users | Username of the caller | Debug user-specific issues or track per-user consumption |
| Virtual Accounts | Virtual account | Monitor usage by application or API key |
| Teams | Team name | Track costs per team for chargebacks or budget management |
| Metadata | Custom metadata keys sent in request headers | Create custom views (e.g. by tenant, environment, or feature) |
Top-Level Counters
Four headline metrics are displayed at the top:- Total Input Tokens — total tokens sent to models in the selected time range.
- Total Output Tokens — total tokens generated by models.
- Total Count of Requests — number of LLM API calls.
- Total Cost of Tokens — aggregate cost in USD.
Performance Charts
Requests Per Second
Shows the throughput over time, broken down by the selected dimension. Use this to identify traffic patterns, peak hours, and how load is distributed across models or users.Request Failure Rate
Displays the percentage of requests that failed over time. A sudden spike here is an early warning of provider outages, quota exhaustion, or misconfiguration.Request Failures Breakdown
A stacked bar chart showing the distribution of failures by error type across time. This makes it easy to see whether failures are dominated by a single error code or spread across multiple types.Request Failure Rate By Error Type
Breaks down the failure rate by HTTP status code (4xx, 5xx, etc.). This helps distinguish between client-side errors (e.g. malformed requests) and provider-side issues (e.g. rate limiting, server errors).Latency Charts
Request Latency
The end-to-end time taken to process a request, from the moment the gateway receives it until the complete response is returned. Displayed with P50, P75, P90, and P99 percentile selectors. Use this to identify models with consistently high latency or detect latency regressions over time.Time To First Token (TTFT)
The time elapsed until the first token of a response is received. This is the most important latency metric for streaming use cases — it directly impacts the perceived responsiveness of your application.Inter Token Latency (ITL)
The average time between consecutive tokens in a streaming response. High ITL means your users experience stuttering or pauses in the response stream.Time Per Output Token (TPOT)
The average time to generate each output token. This normalizes latency by output length, making it useful for comparing models that produce different response sizes.Cost and Token Charts
Cost of Inference
Shows cost over time, broken down by the selected dimension. Use this to track spending trends, identify cost spikes, and compare the cost-effectiveness of different models.Input Tokens
Input token volume over time. Helps you understand how prompt sizes are trending and which models or users are sending the most context.Output Tokens
Output token volume over time. Useful for identifying models that generate verbose responses or users whose usage patterns lead to high output costs.Filtering
Click the Filter button in the top bar to narrow down the data. You can filter by metadata fields like user email, model name, and more. Active filters are shown as tags below the View by selector, and you can clear them at any time.
Exporting Data
Click the export icon in the top-right corner to download aggregated metrics data. You can choose which dimensions to group the data by (Models, Virtual Models, Users, Virtual Accounts, Teams) and also include any custom metadata keys. The data can be downloaded as a CSV or fetched via API.
Common Use Cases
- Compare models: Switch to the Models view and look at latency, cost, and error rates side by side. If a model has high P99 latency, it may be causing tail latency issues in your application.
- Debug user issues: Switch to the Users view and filter for a specific user. Check if they are hitting higher error rates or experiencing worse latency than average.
- Track team spending: Switch to the Teams view to see cost breakdowns for internal chargebacks or budget management.
- Evaluate routing changes: Switch to the Virtual Models view to see if a routing rule change shifted traffic as expected and whether the new target model performs better.
- Custom tenant analytics: Switch to the Metadata view and group by a custom key like
tenant_nameto build per-customer cost and usage reports.