Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

‍

Lunary Integration with TrueFoundry AI Gateway

The TrueFoundry AI Gateway gives you a single OpenAI-compatible endpoint in front of 1,000+ LLMs, with smart routing, fallbacks, rate limiting, and cost controls — all at roughly 3–4 ms of added latency and 350+ RPS on a single vCPU. That centralization is exactly what makes observability tractable: every request already passes through one control plane, so there's one natural place to emit telemetry from.

Lunary is where that telemetry becomes insight. Instead of scrolling raw logs, you get a structured trace view: the span hierarchy of a request, the exact prompt and completion, token counts, latency per step, and the session it belongs to. Because the gateway speaks OpenTelemetry, the export is vendor-neutral — you're sending standard OTLP spans, not locking into a proprietary agent or SDK. If you later add another OTEL-compatible backend, the gateway can fan out to it without re-instrumentation.

In short: TrueFoundry handles the routing, governance, and reliability of your LLM traffic; Lunary gives you the lens to inspect, debug, and optimize it. Here's how to set it up.

What is Lunary?

Lunary is an observability platform built specifically for LLM and agent applications. It's designed to capture traces, inspect prompts and responses, monitor agent behavior, and track cost across production GenAI workloads. The features that matter most for a gateway integration are:

Agent and LLM tracing — end-to-end visibility into chains, agents, tool calls, and model responses, so you can see the full shape of a multi-step request rather than a single flat log line.
OpenTelemetry ingestion — Lunary accepts OTLP/HTTP traces, which is precisely what lets it receive spans directly from the TrueFoundry gateway alongside spans from your SDKs and custom instrumentation.
Session-level monitoring — track conversations, users, and performance across chatbot and RAG workflows, not just isolated calls.
Evaluation and analytics — analyze runs, categorize outputs, and iterate on prompts with measurable feedback loops.

Lunary is a strong fit for teams that want a dedicated, structured view of their LLM and agent traffic, run as a managed cloud project. If you're still comparing options, it's worth reading our roundup of LLM observability tools to see where Lunary sits relative to other platforms.

Prerequisites

Before you start, make sure you have:

A TrueFoundry account with at least one model provider configured. If you're new, follow the Gateway Quick Start Guide first.
A Lunary account — sign up at lunary.ai.
Your Lunary project public key (also called the Project ID / Public Key), copied from your Lunary project settings.

One note before you begin: this integration supports both OTEL traces and metrics. Traces go to Lunary's /v1/traces endpoint; if you also want metrics, Lunary exposes a /v1/metrics endpoint you can point the metrics exporter at.

Step-by-Step Integration Guide

The whole integration is configuration, not code — you're pointing the gateway's OpenTelemetry exporter at Lunary's OTLP endpoint and authenticating with your public key.

Step 1: Get Your Lunary Public Key

Log into the Lunary dashboard.
Open your project and go to Settings → API Keys (the project keys section).
Copy the Project ID / Public Key. Treat it like a credential used for trace ingestion and store it securely.

Lunary projects include both a public and a private key. For OTLP trace export from TrueFoundry, the public key is recommended, but either key works — it goes into the Authorization header in Step 3.

Step 2: Configure OTEL Trace Export in TrueFoundry

In the TrueFoundry dashboard, go to Settings → Organisation → OTEL Config (under AI Gateway).
Click edit on the OTEL Config section to open the exporter form, if it isn't already open.
Enable the OTEL Traces Exporter Configuration toggle.
Select HTTP Configuration.
Enter the Lunary traces endpoint: https://api.lunary.ai/v1/traces‍
Set Encoding to Proto (JSON encoding also works if you prefer it).

Step 3: Add the Authorization Header

Enable Headers and add your Lunary authentication:

Header Value Authorization Bearer <LUNARY_PUBLIC_KEY>

Replace <LUNARY_PUBLIC_KEY> with the public key you copied in Step 1, then click Save to apply the configuration.

Step 4: Verify the Integration

Send a few requests through the TrueFoundry AI Gateway — via the Playground or any API call.
In TrueFoundry, open Monitor and confirm traces are being generated for those requests.
In Lunary, open Explore → Traces and confirm new spans appear with the expected hierarchy, token usage, and latency details.

Lunary ingests OTLP spans asynchronously, so allow a short delay after your first gateway request before checking the Traces view. If spans don't appear immediately, give it a moment and resend.

Configuration Summary

Configuration Values

‍Traces endpoint : https://api.lunary.ai/v1/traces

Authentication : Authorization Bearer <LUNARY_PUBLIC_KEY>

‍Protocol : HTTP

‍Encoding : Proto

For SDK-based instrumentation, self-hosted Lunary, or additional OpenTelemetry options, see Lunary's documentation.

What You Unlock

Once traces are flowing, the combination of gateway control and trace-level observability gives you a few capabilities that are hard to get otherwise.

Trace-Level LLM Observability

Every request that passes through the gateway becomes a structured trace in Lunary — prompt, completion, model, and span hierarchy in one view. This is the difference between guessing why a response was slow or wrong and actually seeing the LLM tracing for that specific call.

Cost and Token Visibility

Spans carry token usage and latency, so you can attribute spend and performance down to individual requests and sessions. That makes it far easier to spot the prompt template or model choice that's quietly inflating your bill.

Agent and MCP Monitoring

For multi-step agents and tool-calling workflows, the trace view preserves the call hierarchy — including MCP and model call details — so a complex agent run reads as a coherent tree instead of scattered log lines.

Vendor-Neutral by Design

Because the export is standard OTLP over HTTP, you're not locked into a single backend. The same OpenTelemetry pipeline that feeds Lunary can feed any other OTEL-compatible destination, which keeps your AI gateway observability strategy portable.

Centralized Control, Decentralized Insight

The gateway stays the single enforcement point for routing, rate limits, and governance, while your observability lives in a dedicated tool focused on LLM and agent traffic. You get governance without sacrificing transparency.

Conclusion

LLM observability shouldn't require ripping instrumentation into every service or committing to a closed platform. With the Lunary integration with TrueFoundry AI Gateway, you route all your LLM traffic through one OpenAI-compatible control plane and export standard OpenTelemetry traces to Lunary — capturing prompts, completions, token usage, latency, and full agent hierarchies in a few minutes of configuration.

Route your LLM traffic through the TrueFoundry AI Gateway and start sending traces to Lunary today.

‍

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now