Breaking Down AI Gateway Usage: Customer and User-Level Analytics

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

As LLM usage scales across teams and features like chat, embedding, rerank, and real-time inference, token-based billing introduces cost complexity. Yet many organizations lack visibility into core questions like who uses the most tokens?, which features are the costliest?, and how usage is distributed across teams or customers?. Without detailed attribution, controlling spend or evaluating impact becomes difficult.

TrueFoundry changes the narrative by embedding metadata tagging directly into every LLM call. Whether you’re a multi-tenant SaaS provider tracking customer spend or an internal platform team monitoring feature consumption, TrueFoundry delivers a transparent view of usage data. Engineering, finance, and product stakeholders all gain instant access to detailed dashboards that map cost back to the right customer, team, or use case.

In this article, you’ll discover how granular tracking and cost attribution empower smarter decisions and unlock the full potential of your LLM investments.

How TrueFoundry Tracks LLM Usage and Costs

TrueFoundry provides detailed observability for every LLM request, enabling fine-grained cost attribution and usage analysis across teams, features, and customers. Each request is automatically logged with comprehensive metadata, including:

Model name
Timestamp
Input and output token counts
Temperature and max tokens
Latency and cost
Request type (e.g., chat, completion)
Custom metadata (e.g., tags)

Tracking LLM Usage Across Multiple Dimensions

When initializing the TrueFoundry client, developers can pass custom tags, such as customer_id, business_unit, or feature_name. These tags are stored alongside each request and are queryable via dashboards and APIs. This enables organizations to:

Attribute costs per tenant in a multi-tenant SaaS environment using customer_id
Track usage by business unit or department using organizational tags
Analyze token consumption by product feature, such as chatbots, recommendation engines, or analytics modules

‍

TrueFoundry LLM Usage Analytics:

Feeling in the dark about where your LLM spending and usage are going? TrueFoundry’s usage analytics shines a spotlight on every token and dollar, transforming uncertainty into actionable insights.

TrueFoundry equips you with:

Custom metadata tagging: Automatically tag each LLM request with fields like customer_id, business_unit, or feature_name for precise attribution.
Multi-dimensional usage breakdown: View usage and cost by model, user, team, or custom tag to identify high-consumption workloads at a glance.
Interactive dashboards: Access real-time graphs for requests, input/output tokens, latencies, error rates, and cost trends across all models.
Granular cost attribution: Drill into token counts, cost per request, and total spend per customer or feature to optimize budgets and show ROI.
Queryable analytics API: Export and query raw usage data or integrate with external BI tools for custom reporting, alerts, and deeper analysis.

Get Started with Truefoundry

Real-Time Insights and Optimization

Tagged metadata supports flexible filtering and grouping, allowing cross-functional teams to break down usage by any custom dimension. For example:

A product team can monitor which features generate the most token usage and correlate that with user engagement.
Finance teams can allocate costs precisely to internal teams or clients using tagged usage data.
Engineering leads can track performance and optimize high-cost prompts or services based on token and latency trends.

Benefits of Granular Attribution

Transparent Chargebacks: Enables automated, usage-based internal or external billing to drive accountability across teams or clients.
Improved ROI Analysis: Helps product and analytics teams evaluate the return on AI investment by mapping token usage to business outcomes.
Predictable Budgeting: Supports precise forecasting and budget enforcement with spend monitoring and alerting based on tag-level trends.

By combining deep request-level visibility with custom tagging, TrueFoundry enables organizations to operationalize LLM observability, cost control, and performance optimization in a scalable, transparent manner.

Key Metrics for Evaluating Gateway

Criteria	What should you evaluate ?	Priority	TrueFoundry
Latency	Adds <10ms p95 overhead for time-to-first-token?	Must Have	✅ Supported
Data Residency	Keeps logs within your region (EU/US)?	Depends on use case	✅ Supported
Latency-Based Routing	Automatically reroutes based on real-time latency/failures?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported
Key Rotation & Revocation	Rotate or revoke keys without downtime?	Must Have	✅ Supported

Evaluating an AI Gateway?

A practical guide used by platform & infra teams

Driving Strategic Actions with LLM Usage Analytics

TrueFoundry transforms detailed LLM usage data into actionable insights, enabling product, engineering, and finance teams to make informed decisions that optimize performance and control costs.

Strategic Decisions Enabled by Usage Breakdowns

Tiered Pricing Models

With comprehensive visibility into token consumption patterns, organizations can design pricing tiers that reflect actual usage. By analyzing historical data, teams can:

Set base plans aligned with average monthly token usage.
Offer discounted overage rates to customers who use tokens efficiently.
Introduce premium tiers for heavy users requiring larger quotas.

Example: A SaaS provider might establish a Standard tier capped at 200,000 tokens per month and a Professional tier at 1 million tokens. As customers' needs evolve, they can transition between tiers seamlessly, ensuring fair and predictable pricing.

User Quota Enforcement

TrueFoundry offers built-in support for enforcing usage quotas through its AI Gateway by leveraging rate limiting in AI gateway rules that control consumption across users, teams, and virtual accounts. This capability ensures that organizations can control consumption at multiple levels, preventing cost overruns and enabling safe experimentation.

Quotas can be applied to:

Individual Users
Example: Restrict bob@email.com to 1,000 requests per day.
Teams
Example: Limit the frontend team to 5,000 requests per day.
Virtual Accounts
Example: Cap the virtual account va-james at 1,500 requests per day.

These constraints are configured using a gateway-rate-limiting-config YAML file, where each rule defines the subject, threshold, and unit of measurement. Rules are evaluated in sequence, and the first applicable rule triggers enforcement.

Sample Configuration:

name: ratelimiting-config
type: gateway-rate-limiting-config
rules:
  - id: "rule-id"
    when:
      subjects: ["team:frontend"] # or ["user:email"] or ["virtualaccount:name"]
    limit_to: 5000
    unit: requests_per_day

All matching rules are taken into account, and if any are exceeded, the corresponding rule ID is returned to the user, providing clarity on which quota was triggered.

This enforcement mechanism enables you to:

Prevent unexpected usage spikes by capping traffic at the user, team, or virtual account level.
Offer tiered plans with predefined limits for freemium or trial accounts.
Trigger alerts as thresholds approach, allowing stakeholders to take corrective action.

With quota enforcement configured at the gateway layer, TrueFoundry ensures fine-grained control without requiring changes to downstream models or infrastructure. This makes it ideal for running pilots, offering trials, and building scalable, cost-controlled multi-tenant AI services.

Identifying Under-Optimized Customers or Features

By combining cost data with performance metrics, TrueFoundry helps identify inefficiencies. These insights also help teams tune an LLM router, so requests can be directed toward the model that best balances latency, cost, and output quality. Teams can:

Flag customer segments or features with high token spend but low engagement.
Analyze prompt templates and workflows that drive excessive consumption.
Prioritize optimization efforts or refactor code paths to improve ROI.

Example: If a translation feature incurs high token costs without generating additional revenue, teams can iterate on model prompts or switch to a more efficient model to balance performance and price.

Cross-Functional Impact

Go-to-Market Teams

Sales and marketing teams leverage TrueFoundry’s usage reports to align value propositions with customer outcomes. They can:

Justify premium pricing by demonstrating how token usage correlates with business results.
Craft targeted upsell campaigns for accounts trending toward higher consumption.
Provide customers with transparent usage reports, building trust and reducing churn.

Finance and Operations

Finance teams gain forecasting accuracy by analyzing tagged usage trends over time. With this data, they can:

Project AI spend based on month-over-month growth rates.
Implement internal chargeback models to align costs with revenue centers.
Plan infrastructure capacity to match demand, avoiding both over-provisioning and performance bottlenecks.

By translating detailed usage breakdowns into clear, actionable insights, TrueFoundry empowers every team in an organization to optimize costs, improve feature performance, and scale AI initiatives with confidence.

Implementing Tagging and Usage Tracking in TrueFoundry

Implementing granular usage tracking with TrueFoundry involves three core steps: applying metadata tags on every call, integrating that data with your analytics or billing tools, and embedding best practices to align insights with business goals.

Implement Tagging and Usage Tracking

Tagging and metadata tracking in TrueFoundry enable granular observability into how LLM infrastructure is being used across environments, teams, features, and customers.

Add Metadata to LLM API Requests

TrueFoundry allows you to attach custom metadata to each LLM request using the X-TFY-METADATA header. This metadata is stored alongside each call and can be used for logging, filtering, and attribution.

Example:

metadata = {
    "tfy_log_request": "true",      # Enables request logging
    "environment": "staging",       # Tracks deployment environment
    "feature": "countdown-bot"      # Identifies the calling feature
}

client.chat.completions.create(
    # ... other parameters ...
    extra_headers={
        "X-TFY-METADATA": '{"tfy_log_request":"true"}'
    }
)

This ensures that each API call carries rich context for analytics, cost attribution, and debugging.

Apply Tags to ML Runs

If you're using TrueFoundry’s ML platform for training or experimentation, you can tag each run to organize experiments by framework, task, or business objective.

Example:

import truefoundry.ml as tfm

client = tfm.get_client()
run = client.create_run(ml_repo="my-classification-project")
run.set_tags({"nlp.framework": "Spark NLP"})
run.end()

These tags help you categorize runs in dashboards, search past experiments, and enforce governance policies.

Best Practices for Tagging

Use consistent formats, such as snake_case for tag keys and values
Validate tag inputs via CI or pre-commit hooks
Audit and rotate outdated tags periodically to maintain clean logs

التكامل مع لوحات معلومات الفواتير وأدوات التحليل

بمجرد تمكين الوسم، توفر TrueFoundry طرقًا متعددة لتصور وتحليل استخدام نماذج اللغة الكبيرة (LLM) عبر مؤسستك. توفر لوحة معلومات التحليلات المضمنة رؤى في الوقت الفعلي حول استهلاك الرموز، ونسب مئوية لوقت الاستجابة (P50، P90، P99)، ومعدلات الأخطاء، والتكاليف. يتم تقسيم هذه المقاييس حسب المستخدم والنموذج ونوع الطلب، مما يتيح للفرق مراقبة سلامة واجهة برمجة التطبيقات وتحديد الأنماط عالية التكلفة أو عالية وقت الاستجابة بسرعة.

لإجراء تحليل متقدم، تدعم TrueFoundry التكامل مع أدوات مثل Tableau وLooker وGrafana. يمكنك ربط مجموعة بيانات الاستخدام الخاصة بك لإنشاء لوحات معلومات تسلط الضوء على الرموز لكل عميل، والتكلفة لكل ميزة، واتجاهات الاستخدام بمرور الوقت.

يمكن لفرق المالية والعمليات تصدير بيانات الاستخدام عبر واجهة برمجة تطبيقات الاستخدام (Usage API) إلى مستودعات بيانات مركزية مثل Snowflake أو BigQuery أو Redshift. يتيح ذلك إعداد تقارير استرداد التكاليف، ومقارنة الإنفاق على الذكاء الاصطناعي عبر الأقسام، والتنبؤ المالي.

تتكامل TrueFoundry أيضًا مع منصات المراقبة (observability platforms)، بما في ذلك Datadog وPrometheus وCloudWatch وNew Relic. توفر هذه التكاملات مراقبة موحدة لأداء النظام ومقاييس استخدام نماذج اللغة الكبيرة (LLM).

يمكن لمستخدمي Grafana إنشاء لوحات معلومات في الوقت الفعلي تصور استخدام وحدة المعالجة المركزية (CPU) ووحدة معالجة الرسوميات (GPU) والشبكة على مستوى المهمة أو النشر. يضمن ذلك رؤية كاملة لسلوك النموذج والبنية التحتية الأساسية.

مواءمة البيانات مع أهداف العمل

تصبح المقاييس الأولية ذات قيمة فقط عند ربطها بأهداف عمل ذات معنى. بفضل إمكانيات TrueFoundry للوسم والمراقبة، يمكن للفرق تحديد مؤشرات الأداء التي تعكس القيمة الفعلية. تعاون مع أصحاب المصلحة في المنتجات والمالية والتحليلات لتحديد مؤشرات الأداء الرئيسية (KPIs) مثل التكلفة لكل تفاعل، والرموز لكل تحويل، أو الإيرادات المتولدة لكل ألف رمز.

يجب دمج مؤشرات الأداء الرئيسية هذه في مراجعات الأعمال، وخرائط طريق المنتجات، وجلسات التخطيط المالي لضمان توافق الإنفاق على نماذج اللغة الكبيرة (LLM) مع النتائج الاستراتيجية. يمكن لبيانات الاستخدام توجيه قرارات الاستثمار، وتحديد الميزات ذات الأداء الضعيف، وتسليط الضوء على فرص تحسين النموذج.

حافظ على مسرد مشترك للعلامات والميزات ومؤشرات الأداء الرئيسية للمساعدة في تدريب أعضاء الفريق الجدد وتجنب الارتباك بين الوظائف. وفر الوصول إلى لوحات المعلومات للفرق خارج الهندسة، بما في ذلك المبيعات والتسويق والدعم. يتيح لهم ذلك:

مراقبة ارتفاعات الاستخدام أو الحالات الشاذة
التحقق من صحة جهود التحسين، مثل ضبط المطالبات الذي يقلل من استهلاك الرموز
اقتراح وتقييم التجارب، مثل التحول إلى نموذج أصغر لحالات الاستخدام الأقل أهمية

عند ربطها بأهداف واضحة، تصبح بيانات الاستخدام أصلًا استراتيجيًا. من خلال مواءمة الوسم والتتبع والتحليل مع أولويات المؤسسة، تساعد TrueFoundry الشركات على توسيع نطاق اعتماد نماذج اللغة الكبيرة (LLM) بمسؤولية مع زيادة العائد على الاستثمار.

الخاتمة

تحول TrueFoundry استخدام نماذج اللغة الكبيرة (LLM) من نفقات خفية إلى محرك للابتكار والنمو. مع كل استدعاء لواجهة برمجة التطبيقات (API) يتم وسمه حسب العميل أو الفريق أو الميزة، تكتسب مؤسستك رؤية واضحة تمامًا حول إنفاق الرموز والأداء. يضمن التكامل السلس مع أدوات التحليل والفواتير أن فرق المالية والعمليات تعمل ببيانات حديثة. من خلال مواءمة مقاييس الاستخدام مع أهداف العمل، يحدد مديرو المنتجات أولويات الميزات عالية التأثير، ويقوم قسم الهندسة بتحسين سير العمل المكلف. والنتيجة هي ميزانية أكثر ذكاءً، وعائد استثمار أوضح، واتخاذ قرارات أسرع عبر مؤسستك بأكملها. اعتمد تفصيل TrueFoundry الدقيق للاستخدام اليوم لإطلاق العنان للإمكانات الكاملة لاستثماراتك في نماذج اللغة الكبيرة (LLM).

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now