TrueFoundry + Seldon: One Control Plane for Enterprise AI

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

TL;DR TrueFoundry and Seldon are coming together into one platform. Seldon’s real-time ML serving joins TrueFoundry’s AI Deploy and AI Gateway, which gives enterprises a single control plane for both classic ML and agents. Your production models keep running on the same Kubernetes you use today, and you gain a clear path to LLMs and AI agents on top of them.

Most enterprise AI teams now operate on both sides of a single line. They run classic ML models in production for things like fraud scoring, churn prediction, and recommendations. At the same time they are building agentic applications that reason, call tools, and act on their own. Those two worlds used to move at different speeds. They don’t anymore. Both are becoming business-critical at once, and running them as two separate stacks, with two vendors and two governance models, gets expensive and brittle fast.

That is the gap TrueFoundry and Seldon are closing together. Seldon spent more than a decade perfecting real-time ML serving for some of the most demanding enterprises in the world. TrueFoundry built the control plane around modern AI, with deployment, an AI Gateway, and governance for LLMs and agents. We are merging the two into one platform, so teams get a single place to run predictive models and agents on the Kubernetes foundation they already trust.

‍

Two teams, one architecture

This merge works because neither side has to give up its design. Seldon and TrueFoundry made the same architectural choice years ago. Both run as a control plane on the customer’s own Kubernetes, inside their VPC, on-prem, or air-gapped. Both are cloud-agnostic. Both lean on the same standard components for traffic, autoscaling, and telemetry, and both hand a team its own namespace instead of asking for cluster-wide admin.

Seldon took that shared foundation and went deep on one layer, serving classic ML in real time at scale. We took the same foundation and went wide, from deployment up through a gateway that governs models, agents, and tools. So the two platforms were never on a collision course. They were solving neighbouring parts of the same problem, the same way. Bringing them together just connects two layers that already speak the same language.

‍

What Seldon brings

Seldon has been the backbone for real-time ML inference at banks, telecoms, insurers, retailers, and healthcare companies for over a decade. That reputation comes from going deep on the hardest parts of production ML serving:

Core 2 pipelines stitch several models, transformers, and routers into one served application. The caller gets a synchronous response, while a Kafka dataflow moves data between the steps inside. That is real application orchestration at the serving layer, not just single-model endpoints.
MLServer is a standards-based, multi-model runtime built on the V2 Open Inference Protocol, with adaptive batching and multi-model loading.
Alibi Detect and Alibi Explain handle drift and outlier detection and model explanations, the model-quality signals that risk and clinical teams rely on.
Model Performance Monitoring tracks how a model behaves once it is live.

Together that is a mature serving and monitoring layer with deep roots in regulated, low-latency settings, the kind where a bad prediction turns into a customer problem within minutes.

‍

What TrueFoundry brings

TrueFoundry built the control plane around the model, the layers that turn a served model into a governed, cost-aware, agent-ready application:

AI Deploy runs and scales ML and GenAI workloads on Kubernetes, with the runtime flexibility to serve models on backends like FastAPI, Triton, and vLLM.
The AI Gateway puts 1,000+ LLMs behind a single OpenAI-compatible API, with unified access controls, guardrails, and request-level observability. It adds roughly 3 to 4 ms of overhead and handles 350+ RPS on a single vCPU, so it sits in the hot path without becoming the bottleneck.
The Agent Gateway extends that control into how agents actually operate: the MCP servers they call, the tools they use, and the handoffs between them. Agents from LangGraph, CrewAI, AutoGen, or a custom framework can be deployed and governed in one place.

Our customers already run business-critical AI through the Gateway at the scale of more than a trillion tokens a day.

‍

Better together: one control plane

Here is the part that matters. When Seldon’s serving layer and TrueFoundry’s deploy and gateway layers come together, an enterprise gets one control plane that covers both halves of its AI.

Deploy runs the workloads. The Gateway governs and routes them. Your classic ML models and your agents sit on the same Kubernetes, under one set of access controls, one observability stack, and one view of cost. A single request can hit a fraud model, an LLM, and an agent that calls a tool, and every hop is logged and governed the same way.

Layer	What runs here	What it gives you
TrueFoundry Deploy, with Seldon’s serving	Real-time ML, Core 2 pipelines, batch, and GenAI workloads	Production-grade serving and monitoring on your own Kubernetes
AI Gateway and Agent Gateway	1,000+ LLMs, MCP servers, tools, and agents	One place to connect, govern, and observe every model and agent call

‍

Where enterprise AI spend is going

It helps to look at where the money is moving. Predictive ML is mostly built and runs steadily. The new investment in enterprise AI is going into LLMs and agents, and that spend flows through the Gateway. Every model call, every agent step, every tool invocation passes through it, which is why our customers already push more than a trillion tokens a day across it.

So the combined platform does two useful things at once. It keeps the ML you have already built running, with no disruption. And it puts you on the surface where the next wave of AI spend lands, without standing up a separate platform to get there.

‍

What this means for you

If you run Seldon today, your real-time ML keeps running on the Kubernetes you already operate. Models served over the V2 protocol stay portable, and Core 2 pipelines map onto Deploy’s own primitives rather than being rewritten. On top of that, you gain the AI Gateway and the agent layer without standing up anything new.

If you run TrueFoundry today, nothing in your stack changes, and Deploy gets stronger. It picks up Seldon’s serving and monitoring lineage, including drift and outlier detection, model explanations, and performance monitoring that feed straight into the platform you already use.

For both, it is one stack instead of two. One place to deploy a model, route a call, watch for drift, govern an agent, and account for the cost, across classic ML and GenAI, on the infrastructure you already run.

‍

Conclusion

Enterprises should not have to run one platform for their models and another for their agents. With TrueFoundry and Seldon together, they don’t have to. Seldon’s real-time ML serving and TrueFoundry’s AI Gateway now sit on one control plane for enterprise AI, on the Kubernetes you already run. Your production ML keeps going, and the path to agents and LLMs is right there on top of it.

See how TrueFoundry’s AI Gateway unifies ML and agents → Explore the AI Gateway

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now