Exporting TrueFoundry AI Gateway Traces to SigNoz via OTLP

Auf Geschwindigkeit ausgelegt: ~ 10 ms Latenz, auch unter Last
Unglaublich schnelle Methode zum Erstellen, Verfolgen und Bereitstellen Ihrer Modelle!
- Verarbeitet mehr als 350 RPS auf nur 1 vCPU — kein Tuning erforderlich
- Produktionsbereit mit vollem Unternehmenssupport
The TrueFoundry AI Gateway generates OpenTelemetry spans for every LLM request and publishes them asynchronously over NATS. SigNoz Cloud accepts these spans over OTLP/HTTP at a regional ingestion endpoint authenticated via a per-workspace ingestion key. Connecting the two requires configuring the gateway's OTEL exporter with the SigNoz ingestion URL and adding the signoz-ingestion-key header. No application-level instrumentation changes are required.
This post covers the OTEL trace generation path inside the TrueFoundry AI Gateway and the ingestion and storage pipeline that SigNoz Cloud uses and the configuration surface for wiring the two systems together. It is not a setup guide. It is an explanation of how the integration works at the architecture level.
How the TrueFoundry AI Gateway Generates and Exports Traces
The TrueFoundry AI Gateway is built on the Hono framework and runs as a stateless pod. A single pod on 1 vCPU and 1 GB RAM handles over 250 requests per second with approximately 3 ms of added latency. That throughput is possible because the gateway makes zero external calls in the request path. Authentication runs against cached public keys downloaded once from the identity provider. Authorization checks run against an in-memory map of users to models synced via NATS. Model routing logic runs entirely in memory against a local copy of the routing configuration.
OTEL trace generation follows the same zero-external-call principle. When a request completes the gateway publishes the span data asynchronously to NATS. The OTEL exporter reads from this async path and forwards the span to the configured external endpoint. The exporter never touches the request path. A slow or unreachable OTEL backend never stalls a request and never adds latency visible to the client.
The gateway generates spans at several points in the request lifecycle. Inbound HTTP handling and authentication and model resolution and the outbound provider call each produce spans that are assembled into a trace. The spans carry a specific set of attributes. The tfy.input attribute contains the full request body sent to the LLM. The tfy.output attribute contains the full response body. The tfy.input_short_hand attribute contains a condensed summary of the input with boolean flags for file and image and audio content. The tfy.span_type attribute identifies the operation type as ChatCompletion or AgentResponse or MCPGateway depending on which gateway path handled the request. Standard gen_ai.* semantic conventions are also included covering prompt token counts and completion token counts and model identifiers and finish reasons.
The gateway also exposes an Exclude Request Data toggle in the OTEL configuration. When enabled the exporter strips tfy.input and tfy.output and tfy.input_short_hand from spans before forwarding them. This is the correct setting for teams that need full trace visibility for latency and error analysis but must not transmit prompt or response content to an external platform.
Export is additive. Enabling a SigNoz OTEL destination does not replace or interrupt TrueFoundry's own internal trace storage. Both paths receive the same spans from the same async NATS publish.
What SigNoz Does with the Traces
SigNoz is built around a custom OpenTelemetry Collector that accepts telemetry data and writes it to ClickHouse. The collector is configured to ingest data over standard OTLP protocols and provides protocol translation for seamless integration with existing monitoring tools and processes data with metadata enrichment before writing to storage.
The ingestion pipeline follows the OpenTelemetry Collector architecture with receivers and processors and exporters. The OTLP receiver accepts trace spans over both gRPC on port 4317 and HTTP on port 4318. A batch processor groups spans for efficiency before export. The signozspanmetrics processor generates RED metrics (Rate and Error and Duration) directly from span data so that request rate and error rate and latency percentiles are available as queryable metrics without any separate instrumentation. Trace spans are written to the distributed_signoz_index_v3 table in ClickHouse. Generated metrics flow to the signoz_metrics.distributed_samples_v4 table.
SigNoz Cloud uses a per-tenant architecture. Each tenant gets their own SigNoz instance and their own ClickHouse for storing telemetry data and their own OTel collector for ingestion and their own regional endpoint. Shared infrastructure handles initial ingestion: an OpenTelemetry gateway receives telemetry and batches it and forwards it to a Redpanda streaming buffer before the per-tenant SigNoz instance consumes and writes it to ClickHouse. This means that the ingest.<region>.signoz.cloud:443 endpoint is a shared ingestion layer. The signoz-ingestion-key header routes the data to the correct per-tenant pipeline after ingestion.
The SigNoz query service reads from ClickHouse using optimized SQL queries that leverage the columnar storage format for aggregation over large volumes of span data. The Traces explorer surfaces individual spans filterable by service name and operation and status and duration as shown below.

The Metrics explorer accepts PromQL-compatible queries against the generated RED metrics. Because the gateway emits standard gen_ai.* attributes alongside tfy.* attributes SigNoz can surface both generic HTTP performance data and LLM-specific data in the same trace view.

The Integration Surface
The TrueFoundry AI Gateway connects to SigNoz through two OTEL exporter configurations: one for traces and one for metrics. Both use HTTP and Proto encoding. The signoz-ingestion-key header is required on both exporters. The endpoint format includes the region as a subdomain and connects on port 443 over HTTPS. The configuration is accessible under AI Gateway → Controls → Settings → OTEL Config in the TrueFoundry dashboard as shown below.

OTEL Traces Exporter Configuration

OTEL Metrics Exporter Configuration
The region is derived from the SigNoz Cloud ingestion URL visible in the account settings. For an ingestion URL of https://ingest.in2.signoz.cloud the region is in2 and the traces endpoint becomes https://ingest.in2.signoz.cloud:443/v1/traces. The ingestion key is a workspace-scoped credential available under Settings → Ingestion Settings in the SigNoz Cloud dashboard.
Unlike self-hosted observability targets where traffic stays inside the cluster over plain HTTP SigNoz Cloud is an external endpoint reachable over the public internet. The gateway exporter establishes a TLS connection to port 443 and sends spans in protobuf-encoded OTLP/HTTP batches. The NATS-based async export path in the gateway means that even if the SigNoz ingestion endpoint is temporarily unreachable the gateway continues processing requests normally. Spans that cannot be delivered are dropped at the exporter level and logged without surfacing errors to the caller.
Both exporters can be enabled independently. Enabling only the traces exporter sends span data to SigNoz while keeping metrics export disabled. Enabling both sends span data to the distributed_signoz_index_v3 table and metric data to the signoz_metrics.distributed_samples_v4 table from which SigNoz generates the RED metric views in the Metrics explorer.
Filtering by Service in SigNoz
The gateway sets service.name to tfy-llm-gateway on all spans as a resource attribute. In the SigNoz Traces explorer filtering by service.name = tfy-llm-gateway isolates all gateway traffic. Filtering by tfy.span_type = ChatCompletion further narrows to LLM inference requests. The tfy.data_routing.destination attribute identifies which model or virtual model handled the request and can be used to group latency distributions by model.
Architecture Summary
A request enters the TrueFoundry AI Gateway and is processed entirely in memory. The gateway forwards the request to the LLM provider and streams the response back to the client. After the response completes the gateway publishes a span to NATS containing the full request and response attributes and token counts and latency. The OTEL exporter picks up the span from NATS asynchronously and forwards it over OTLP/HTTP with the signoz-ingestion-key header to the SigNoz regional ingestion endpoint at port 443. The shared SigNoz OTel gateway receives the span and batches it into Redpanda. The per-tenant SigNoz collector consumes from Redpanda and applies the signozspanmetrics processor to generate RED metrics and then writes spans to distributed_signoz_index_v3 and metrics to signoz_metrics.distributed_samples_v4 in the per-tenant ClickHouse instance. The SigNoz query service reads from ClickHouse and surfaces the trace in the Traces explorer and the generated metrics in the Metrics explorer.
No sidecars are required. No SDK changes are required. No instrumentation code is added to any application. The only configuration change is adding two OTEL exporter endpoints in the TrueFoundry AI Gateway OTEL Config settings. The gateway already generates and publishes the spans. The exporter configuration and the ingestion key determine where they go.
The architectural principle that makes this integration work is the complete separation between the request path and the telemetry export path. The gateway publishes telemetry to NATS after the response is already on the wire. The SigNoz ingestion endpoint can be slow or temporarily unavailable or geographically distant without any effect on the latency or reliability of the gateway. OTEL portability means the same span data that flows to SigNoz today can flow to any other OTLP-compatible backend without changing the gateway configuration beyond the endpoint URL and the authentication header.
TrueFoundry AI Gateway bietet eine Latenz von ~3—4 ms, verarbeitet mehr als 350 RPS auf einer vCPU, skaliert problemlos horizontal und ist produktionsbereit, während LiteLM unter einer hohen Latenz leidet, mit moderaten RPS zu kämpfen hat, keine integrierte Skalierung hat und sich am besten für leichte Workloads oder Prototyp-Workloads eignet.
Der schnellste Weg, deine KI zu entwickeln, zu steuern und zu skalieren














.webp)

.webp)
.webp)


.webp)
.webp)


.png)




