Exportando rastreamentos do TrueFoundry AI Gateway para o SigNoz via OTLP

Built for Speed: ~10ms Latency, Even Under Load
Blazingly fast way to build, track and deploy your models!
- Handles 350+ RPS on just 1 vCPU — no tuning needed
- Production-ready with full enterprise support
The TrueFoundry AI Gateway generates OpenTelemetry spans for every LLM request and publishes them asynchronously over NATS. SigNoz Cloud accepts these spans over OTLP/HTTP at a regional ingestion endpoint authenticated via a per-workspace ingestion key. Connecting the two requires configuring the gateway's OTEL exporter with the SigNoz ingestion URL and adding the signoz-ingestion-key header. No application-level instrumentation changes are required.
This post covers the OTEL trace generation path inside the TrueFoundry AI Gateway and the ingestion and storage pipeline that SigNoz Cloud uses and the configuration surface for wiring the two systems together. It is not a setup guide. It is an explanation of how the integration works at the architecture level.
How the TrueFoundry AI Gateway Generates and Exports Traces
The TrueFoundry AI Gateway is built on the Hono framework and runs as a stateless pod. A single pod on 1 vCPU and 1 GB RAM handles over 250 requests per second with approximately 3 ms of added latency. That throughput is possible because the gateway makes zero external calls in the request path. Authentication runs against cached public keys downloaded once from the identity provider. Authorization checks run against an in-memory map of users to models synced via NATS. Model routing logic runs entirely in memory against a local copy of the routing configuration.
OTEL trace generation follows the same zero-external-call principle. When a request completes the gateway publishes the span data asynchronously to NATS. The OTEL exporter reads from this async path and forwards the span to the configured external endpoint. The exporter never touches the request path. A slow or unreachable OTEL backend never stalls a request and never adds latency visible to the client.
The gateway generates spans at several points in the request lifecycle. Inbound HTTP handling and authentication and model resolution and the outbound provider call each produce spans that are assembled into a trace. The spans carry a specific set of attributes. The tfy.input attribute contains the full request body sent to the LLM. The tfy.output attribute contains the full response body. The tfy.input_short_hand attribute contains a condensed summary of the input with boolean flags for file and image and audio content. The tfy.span_type attribute identifies the operation type as ChatCompletion or AgentResponse or MCPGateway depending on which gateway path handled the request. Standard gen_ai.* semantic conventions are also included covering prompt token counts and completion token counts and model identifiers and finish reasons.
The gateway also exposes an Exclude Request Data toggle in the OTEL configuration. When enabled the exporter strips tfy.input and tfy.output and tfy.input_short_hand from spans before forwarding them. This is the correct setting for teams that need full trace visibility for latency and error analysis but must not transmit prompt or response content to an external platform.
Export is additive. Enabling a SigNoz OTEL destination does not replace or interrupt TrueFoundry's own internal trace storage. Both paths receive the same spans from the same async NATS publish.
What SigNoz Does with the Traces
SigNoz is built around a custom OpenTelemetry Collector that accepts telemetry data and writes it to ClickHouse. The collector is configured to ingest data over standard OTLP protocols and provides protocol translation for seamless integration with existing monitoring tools and processes data with metadata enrichment before writing to storage.
The ingestion pipeline follows the OpenTelemetry Collector architecture with receivers and processors and exporters. The OTLP receiver accepts trace spans over both gRPC on port 4317 and HTTP on port 4318. A batch processor groups spans for efficiency before export. The signozspanmetrics processor generates RED metrics (Rate and Error and Duration) directly from span data so that request rate and error rate and latency percentiles are available as queryable metrics without any separate instrumentation. Trace spans are written to the distributed_signoz_index_v3 table in ClickHouse. Generated metrics flow to the signoz_metrics.distributed_samples_v4 table.
SigNoz Cloud uses a per-tenant architecture. Each tenant gets their own SigNoz instance and their own ClickHouse for storing telemetry data and their own OTel collector for ingestion and their own regional endpoint. Shared infrastructure handles initial ingestion: an OpenTelemetry gateway receives telemetry and batches it and forwards it to a Redpanda streaming buffer before the per-tenant SigNoz instance consumes and writes it to ClickHouse. This means that the ingest.<region>.signoz.cloud:443 endpoint is a shared ingestion layer. The signoz-ingestion-key header routes the data to the correct per-tenant pipeline after ingestion.
The SigNoz query service reads from ClickHouse using optimized SQL queries that leverage the columnar storage format for aggregation over large volumes of span data. The Traces explorer surfaces individual spans filterable by service name and operation and status and duration as shown below.

The Metrics explorer accepts PromQL-compatible queries against the generated RED metrics. Because the gateway emits standard gen_ai.* attributes alongside tfy.* attributes SigNoz can surface both generic HTTP performance data and LLM-specific data in the same trace view.

The Integration Surface
The TrueFoundry AI Gateway connects to SigNoz through two OTEL exporter configurations: one for traces and one for metrics. Both use HTTP and Proto encoding. The signoz-ingestion-key header is required on both exporters. The endpoint format includes the region as a subdomain and connects on port 443 over HTTPS. The configuration is accessible under AI Gateway → Controls → Settings → OTEL Config in the TrueFoundry dashboard as shown below.

OTEL Traces Exporter Configuration

OTEL Metrics Exporter Configuration
The region is derived from the SigNoz Cloud ingestion URL visible in the account settings. For an ingestion URL of https://ingest.in2.signoz.cloud the region is in2 and the traces endpoint becomes https://ingest.in2.signoz.cloud:443/v1/traces. The ingestion key is a workspace-scoped credential available under Settings → Ingestion Settings in the SigNoz Cloud dashboard.
Unlike self-hosted observability targets where traffic stays inside the cluster over plain HTTP SigNoz Cloud is an external endpoint reachable over the public internet. The gateway exporter establishes a TLS connection to port 443 and sends spans in protobuf-encoded OTLP/HTTP batches. The NATS-based async export path in the gateway means that even if the SigNoz ingestion endpoint is temporarily unreachable the gateway continues processing requests normally. Spans that cannot be delivered are dropped at the exporter level and logged without surfacing errors to the caller.
Both exporters can be enabled independently. Enabling only the traces exporter sends span data to SigNoz while keeping metrics export disabled. Enabling both sends span data to the distributed_signoz_index_v3 table and metric data to the signoz_metrics.distributed_samples_v4 table from which SigNoz generates the RED metric views in the Metrics explorer.
Filtering by Service in SigNoz
The gateway sets service.name to tfy-llm-gateway on all spans as a resource attribute. In the SigNoz Traces explorer filtering by service.name = tfy-llm-gateway isolates all gateway traffic. Filtering by tfy.span_type = ChatCompletion further narrows to LLM inference requests. The tfy.data_routing.destination attribute identifies which model or virtual model handled the request and can be used to group latency distributions by model.
Architecture Summary
A request enters the TrueFoundry AI Gateway and is processed entirely in memory. The gateway forwards the request to the LLM provider and streams the response back to the client. After the response completes the gateway publishes a span to NATS containing the full request and response attributes and token counts and latency. The OTEL exporter picks up the span from NATS asynchronously and forwards it over OTLP/HTTP with the signoz-ingestion-key cabeçalho para o endpoint de ingestão regional do SigNoz na porta 443. O gateway OTel compartilhado do SigNoz recebe o span e o agrupa no Redpanda. O por-inquilino SigNoz coletor consome do Redpanda e aplica o signozspanmetrics processador para gerar métricas RED e então escreve os spans em distributed_signoz_index_v3 e métricas em signoz_metrics.distributed_samples_v4 na instância ClickHouse por-inquilino. O serviço de consulta do SigNoz lê do ClickHouse e exibe o rastreamento no explorador de rastreamentos e as métricas geradas no explorador de métricas.
Nenhum sidecar é necessário. Nenhuma alteração no SDK é necessária. Nenhum código de instrumentação é adicionado a qualquer aplicação. A única alteração de configuração é adicionar dois endpoints de exportador OTEL nas configurações de Configuração OTEL do TrueFoundry AI Gateway. O gateway já gera e publica os spans. A configuração do exportador e a chave de ingestão determinam para onde eles vão.
O princípio arquitetônico que faz esta integração funcionar é a separação completa entre o caminho da requisição e o caminho de exportação da telemetria. O gateway publica a telemetria no NATS depois que a resposta já está sendo transmitida. O SigNoz endpoint de ingestão pode ser lento, temporariamente indisponível ou geograficamente distante sem qualquer efeito na latência ou confiabilidade do gateway. A portabilidade OTEL significa que os mesmos dados de span que fluem para o SigNoz hoje podem fluir para qualquer outro backend compatível com OTLP sem alterar a configuração do gateway além do URL do endpoint e do cabeçalho de autenticação.
TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.
The fastest way to build, govern and scale your AI













.webp)






.webp)

.webp)
.webp)





.png)



