Production LLM systems behave like distributed systems. One user request can trigger multiple model calls tool calls and retries. Without a single execution boundary telemetry becomes fragmented and debugging becomes guesswork.

This post shows how to connect TrueFoundry AI Gateway with Elastic Cloud so gateway traces flow into Elastic Observability using OpenTelemetry. You will configure an OTLP endpoint and an API key in the gateway.

The missing control plane in LLM architectures

When applications talk directly to model providers there is no consistent place to enforce policy and capture traces. A gateway creates that consistent surface so governance routing and telemetry generation are centralized.

TrueFoundry AI Gateway

TrueFoundry AI Gateway establishes a single governed entry point for model and agent requests. Applications and agents talk to the gateway proxy instead of talking directly to providers. This architecture makes routing decisions and telemetry generation consistent across every request.

The gateway can export traces using standard OpenTelemetry protocols so you can send the same trace stream to the observability platform your teams already use.

Elastic Cloud

Elastic Cloud is a managed service for the Elastic Stack that supports search observability and security workflows. It can analyze logs metrics and traces at scale which makes it a natural destination for gateway traces.

TrueFoundry AI Gateway supports exporting OpenTelemetry traces to external platforms like Elastic Cloud so you can use Elastic for observability while keeping TrueFoundry as the unified LLM access layer.

OpenTelemetry as the integration layer

This integration uses OpenTelemetry end to end. The gateway exports OTEL traces and Elastic Cloud ingests them through its managed OTLP endpoint.

Integrating with Elastic Cloud

Step 1 Get your Elastic Cloud endpoint and API key

In the Elastic Cloud console open your deployment or serverless project then go to Add data then Applications then OpenTelemetry. Copy the managed OTLP endpoint URL and copy the API key value shown for authentication headers. Elastic Cloud Hosted deployments require version 9.2 or later for the managed OTLP endpoint.

Step 2 Open the AI Gateway OTEL config in TrueFoundry

In the TrueFoundry dashboard go to AI Gateway then Controls then Settings. Scroll to the OTEL Config section and open the editor for the exporter configuration.

Step 3 Configure the Elastic Cloud endpoint

Enable the OTEL Traces Exporter. Set Config Type to http. Set the traces endpoint to the managed OTLP endpoint you copied from Elastic Cloud. Choose Json or Proto encoding.

A minimal config looks like this.

Config type: http Traces endpoint: https://<your motlp endpoint> Encoding: Json or Proto

Step 4 Add the required header

Add an HTTP header named Authorization with the value in the ApiKey format. The ApiKey prefix is required.

Authorization: ApiKey <your api key>

Step 5 Save configuration

Save the OTEL export configuration. After this all gateway traces will be exported to Elastic Cloud automatically.

Step 6 View traces in Elastic

Send a few requests through the gateway. Then open Kibana and go to Observability then APM then Services and look for the service named tfy-llm-gateway. From there you can inspect traces and transactions for each request.

Operational notes

Encoding choice

Elastic Cloud managed OTLP endpoint supports Json and Proto. Json is easier to read during debugging. Proto is more efficient for high volume data.

Adding resource attributes

You can set Additional Resource Attributes in the exporter configuration to attach consistent tags to every exported trace. This is useful for environment and tenant level filtering in Elastic.

Troubleshooting

If you see an authentication error that mentions an ApiKey prefix then the Authorization header is not formatted correctly and should start with ApiKey. If you see HTTP 429 then your deployment may be hitting ingest rate limits and you should consider plan changes or sampling adjustments.

What you can do with this integration

When AI Gateway exports traces to Elastic Cloud you get one place to analyze gateway traces with the same observability workflows you already use for the rest of your stack. Elastic brings logs metrics traces and APM views together in one platform so your LLM traffic is not isolated from application and infrastructure signals.

You can debug a single user request end to end by opening the trace in Elastic. The Traces UI shows distributed tracing so you can see the full path of execution. The service map helps you understand service dependencies. Transaction details give timing and request metadata so you can spot the slow step quickly.

You can detect regressions earlier by watching trends instead of single incidents. Elastic Observability provides dashboards and analysis features that help teams move from raw telemetry to insights. It also includes anomaly detection style capabilities that can surface unusual patterns across signals.

You can run LLM specific monitoring workflows inside Elastic. Elastic highlights LLM observability use cases such as tracking latency errors prompts responses usage and costs. With AI Gateway as the execution boundary you can make this coverage consistent across every model call that flows through the gateway.

You can make traces easier to filter and group by adding resource attributes in the gateway exporter config. This is useful for environment metadata and tenant tags so teams can slice traces by production staging or business unit inside Elastic.

Conclusion

TrueFoundry AI Gateway gives you a consistent execution boundary for all LLM traffic. Elastic Cloud gives you a mature observability surface for traces and service level workflows. With OpenTelemetry connecting them you can debug and operate LLM systems with the same rigor you expect from any production distributed system.

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now

The fastest way to build, govern and scale your AI

Book a Demo

TrueFoundry AI Gateway integration with Elastic

The missing control plane in LLM architectures

TrueFoundry AI Gateway

Elastic Cloud

OpenTelemetry as the integration layer

Integrating with Elastic Cloud

Step 1 Get your Elastic Cloud endpoint and API key

Step 2 Open the AI Gateway OTEL config in TrueFoundry

Step 3 Configure the Elastic Cloud endpoint

Step 4 Add the required header

Step 5 Save configuration

Step 6 View traces in Elastic

Operational notes

Encoding choice

Adding resource attributes

Troubleshooting

What you can do with this integration

Conclusion

Built for Speed: ~10ms Latency, Even Under Load

Enabling the Large Language Models Revolution: GPUs on Kubernetes

Data Residency in the Age of Agentic AI: How AI Gateways Enable Sovereign Scale and Compliance

Prompting, RAG or Fine-tuning - the right choice?

Prompt Engineering: Learning to Interact with LLMs

TrueFoundry AI Gateway integration with Elastic

The missing control plane in LLM architectures

TrueFoundry AI Gateway

Elastic Cloud

OpenTelemetry as the integration layer

Integrating with Elastic Cloud

Step 1 Get your Elastic Cloud endpoint and API key

Step 2 Open the AI Gateway OTEL config in TrueFoundry

Step 3 Configure the Elastic Cloud endpoint

Step 4 Add the required header

Step 5 Save configuration

Step 6 View traces in Elastic

Operational notes

Encoding choice

Adding resource attributes

Troubleshooting

What you can do with this integration

Conclusion

Built for Speed: ~10ms Latency, Even Under Load

Discover More

Enabling the Large Language Models Revolution: GPUs on Kubernetes

Data Residency in the Age of Agentic AI: How AI Gateways Enable Sovereign Scale and Compliance

Prompting, RAG or Fine-tuning - the right choice?

Prompt Engineering: Learning to Interact with LLMs

Subscribe to our newsletter