Lunary Integration with TrueFoundry AI Gateway

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
Lunary Integration with TrueFoundry AI Gateway
The TrueFoundry AI Gateway gives you a single OpenAI-compatible endpoint in front of 1,000+ LLMs, with smart routing, fallbacks, rate limiting, and cost controls — all at roughly 3–4 ms of added latency and 350+ RPS on a single vCPU. That centralization is exactly what makes observability tractable: every request already passes through one control plane, so there's one natural place to emit telemetry from.
Lunary is where that telemetry becomes insight. Instead of scrolling raw logs, you get a structured trace view: the span hierarchy of a request, the exact prompt and completion, token counts, latency per step, and the session it belongs to. Because the gateway speaks OpenTelemetry, the export is vendor-neutral — you're sending standard OTLP spans, not locking into a proprietary agent or SDK. If you later add another OTEL-compatible backend, the gateway can fan out to it without re-instrumentation.
In short: TrueFoundry handles the routing, governance, and reliability of your LLM traffic; Lunary gives you the open-source lens to inspect, debug, and optimize it. Here's how to set it up.
What is Lunary?
Lunary is an open-source observability platform built specifically for LLM and agent applications. It's designed to capture traces, inspect prompts and responses, monitor agent behavior, and track cost across production GenAI workloads. The features that matter most for a gateway integration are:
- Agent and LLM tracing — end-to-end visibility into chains, agents, tool calls, and model responses, so you can see the full shape of a multi-step request rather than a single flat log line.
- OpenTelemetry ingestion — Lunary accepts OTLP/HTTP traces, which is precisely what lets it receive spans directly from the TrueFoundry gateway alongside spans from your SDKs and custom instrumentation.
- Session-level monitoring — track conversations, users, and performance across chatbot and RAG workflows, not just isolated calls.
- Evaluation and analytics — analyze runs, categorize outputs, and iterate on prompts with measurable feedback loops.
Because it's open source, Lunary is a strong fit for teams that want full control over their observability stack — whether running it as a managed cloud project or self-hosted. If you're still comparing options, it's worth reading our roundup of LLM observability tools to see where Lunary sits relative to other platforms.
Prerequisites
Before you start, make sure you have:
- A TrueFoundry account with at least one model provider configured. If you're new, follow the Gateway Quick Start Guide first.
- A Lunary account — sign up at lunary.ai.
- Your Lunary project public key (also called the Project ID / Public Key), copied from your Lunary project settings.
One note before you begin: this integration exports OTEL traces only. OTEL metrics are not included in this configuration, so leave the metrics exporter toggle disabled unless you're configuring a separate metrics destination.
Step-by-Step Integration Guide
The whole integration is configuration, not code — you're pointing the gateway's OpenTelemetry exporter at Lunary's OTLP endpoint and authenticating with your public key.
Step 1: Get Your Lunary Public Key
- Log into the Lunary dashboard.
- Open your project and go to Settings → API Keys (the project keys section).
- Copy the Project ID / Public Key. Treat it like a credential used for trace ingestion and store it securely.
Lunary projects include both a public and a private key. For OTLP trace export from TrueFoundry, you want the public key — it goes into the Authorization header in Step 3.
Step 2: Configure OTEL Trace Export in TrueFoundry
- In the TrueFoundry dashboard, go to Settings → Organisation → OTEL Config (under AI Gateway).
- Click edit on the OTEL Config section to open the exporter form, if it isn't already open.
- Enable the OTEL Traces Exporter Configuration toggle.
- Select HTTP Configuration.
- Enter the Lunary traces endpoint:
https://api.lunary.ai/v1/traces - Set Encoding to
Proto.
Step 3: Add the Authorization Header
Enable Headers and add your Lunary authentication:
Header Value Authorization Bearer <LUNARY_PUBLIC_KEY>
Replace <LUNARY_PUBLIC_KEY> with the public key you copied in Step 1, then click Save to apply the configuration.
Step 4: Verify the Integration
- Send a few requests through the TrueFoundry AI Gateway — via the Playground or any API call.
- In TrueFoundry, open Monitor and confirm traces are being generated for those requests.
- In Lunary, open Explore → Traces and confirm new spans appear with the expected hierarchy, token usage, and latency details.
Lunary ingests OTLP spans asynchronously, so allow a short delay after your first gateway request before checking the Traces view. If spans don't appear immediately, give it a moment and resend.
Configuration Summary
Configuration Values
Traces endpoint : https://api.lunary.ai/v1/tracesAuthentication : Authorization Bearer <LUNARY_PUBLIC_KEY>Protocol : HTTPEncoding : ProtoFor SDK-based instrumentation, self-hosted Lunary, or additional OpenTelemetry options, see Lunary's documentation.
What You Unlock
Once traces are flowing, the combination of gateway control and open-source observability gives you a few capabilities that are hard to get otherwise.
Trace-Level LLM Observability
Every request that passes through the gateway becomes a structured trace in Lunary — prompt, completion, model, and span hierarchy in one view. This is the difference between guessing why a response was slow or wrong and actually seeing the LLM tracing for that specific call.
Cost and Token Visibility
Spans carry token usage and latency, so you can attribute spend and performance down to individual requests and sessions. That makes it far easier to spot the prompt template or model choice that's quietly inflating your bill.
Agent and MCP Monitoring
For multi-step agents and tool-calling workflows, the trace view preserves the call hierarchy — including MCP and model call details — so a complex agent run reads as a coherent tree instead of scattered log lines.
Vendor-Neutral by Design
Because the export is standard OTLP over HTTP, you're not locked into a single backend. The same OpenTelemetry pipeline that feeds Lunary can feed any other OTEL-compatible destination, which keeps your AI gateway observability strategy portable.
Centralized Control, Decentralized Insight
The gateway stays the single enforcement point for routing, rate limits, and governance, while your observability lives in an open-source tool your team fully controls. You get governance without sacrificing transparency.
Conclusion
LLM observability shouldn't require ripping instrumentation into every service or committing to a closed platform. With the Lunary integration with TrueFoundry AI Gateway, you route all your LLM traffic through one OpenAI-compatible control plane and export standard OpenTelemetry traces to an open-source backend you control — capturing prompts, completions, token usage, latency, and full agent hierarchies in a few minutes of configuration.
Route your LLM traffic through the TrueFoundry AI Gateway and start sending traces to Lunary today.
TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
The fastest way to build, govern and scale your AI































