Building the Infrastructure Layer That Enterprise AI Has Been Missing

Résumez avec

Metallic silver knot design with interlocking loops and circular shape forming a decorative pattern.

Blurry red snowflake on white background, symmetrical frosty design with soft edges and abstract shape.

Conçu pour la vitesse : latence d'environ 10 ms, même en cas de charge

Une méthode incroyablement rapide pour créer, suivre et déployer vos modèles !

Gère plus de 350 RPS sur un seul processeur virtuel, aucun réglage n'est nécessaire
Prêt pour la production avec un support complet pour les entreprises

Commencez à utiliser Truefoundry dès maintenant Parlez à l'expert

‍

In 2022, before ChatGPT had entered the cultural vocabulary, our founders Nikunj Bajaj, Abhishek and Anuraag were already building. Not reacting to a trend, not chasing a moment building from a conviction that the enterprise world was about to hit an inflection point it wasn't prepared for.

That conviction came from the inside. Before founding TrueFoundry, Nikunj had spent time at Meta, where the experience of working with machine learning infrastructure at scale fundamentally shifted how he thought about the problem.

"The way you build machine learning models at Meta is foundationally different from how you do it using the public cloud ecosystem outside," Nikunj shared on the Code Story Podcast. "Meta thinks of machine learning as a special case of software engineering and generative AI as a special case of machine learning."

That mental model software at the base, ML in the middle, GenAI at the top, all running through a unified interface on shared infrastructure is not how most enterprises operate. Most organizations run two, sometimes three, parallel stacks: one for software, one for ML, and now increasingly a separate one for GenAI. The result is fragmentation, redundancy, and a system that breaks under its own weight as it scales.

TrueFoundry was founded to fix exactly that.

The Bet That Took a Year to Build

When people talk about MVPs, they usually mean something scrappy a quick prototype to test a hypothesis. TrueFoundry's version of that looked very different.

"We spent more than a year heads-down developing the platform," Nikunj explained. "We were building the core infrastructure on top of which enterprises can start building their machine learning and generative AI applications and start launching them to production."

The technical bet at the center of that work was Kubernetes. The team believed that just as software workloads had converged on Kubernetes for orchestration, ML workloads would follow. At the time, Kubeflow was the dominant tool helping organizations run ML on Kubernetes but it was declining in contribution as Google shifted its investment toward Vertex. The TrueFoundry team saw that gap and moved into it deliberately, building their entire ML and GenAI stack to run natively on Kubernetes. That decision gave them something invaluable: infrastructure that could run anywhere AWS, GCP, Azure, or on-premise without being locked to any single cloud provider.

It was a patient, principled start. And it set the foundation for everything that followed.

Adapting to a World That Wouldn't Stop Moving

One of the most striking things about TrueFoundry's product journey is how consistently the world around them shifted and how deliberately they adapted.

In 2022, large language models became genuinely useful for the first time. In 2023, enterprises discovered that useful responses required grounding in their own data, and RAG (Retrieval Augmented Generation) became the dominant paradigm. By 2024, agents were becoming real, and organizations started thinking seriously about putting AI into production workflows. In 2025, MCP (Model Context Protocol) and agent-to-agent communication emerged as the new frontier.

Every year, the operating model changed. And TrueFoundry's approach was to hold the foundational architecture constant while adapting the layer that sits above it- the developer experience, the interfaces, the connective tissue between components.

"We adapt to the modus operandi," Nikunj said. "We build out the UX layer around it, but we bring everything back to the same grounding foundational principle about how you run these workloads on the same underlying infrastructure."

The product today reflects that philosophy. TrueFoundry's AI Gateway sits in the middle of every API call an enterprise makes to its LLMs and agents. It encompasses an LLM gateway, an MCP gateway, and an agent gateway, a unified control plane for observability, governance, cost management, and compliance across the entire agentic stack. Alongside it, the AI Deployments product lets enterprises run custom models, host MCP servers, and orchestrate agents on their own compute- all through a Kubernetes-native interface.

The Mistake They're Willing to Talk About

Not everything went to plan. In 2024, TrueFoundry, like most of the industry, believed the LLM proxy layer enterprises were building internally would stay thin. The logic made sense at the time: model APIs were largely consistent, the layer was lightweight, and teams were comfortable building it themselves.

"We believed this is a layer that will be built more in-house," Nikunj admitted. "That mistake made us lose some of the development we could have made in the year between 2024 and 2025."

What changed was the complexity. Model API signatures started diverging. New protocols like MCP emerged. Agentic applications went to production, which meant uptime mattered. And what had started as a thin proxy layer suddenly needed to become an enterprise-grade control plane with guardrails, compliance rules, cost tracking, and centralized observability across every agent in the organization.

When TrueFoundry saw that shift happening, they moved fast. Within six months, they rebuilt and expanded their AI gateway into one of the most capable products in the market. Today, the gateway runs across 17 regions globally, achieves over four nines of uptime, introduces less than five milliseconds of latency, and handles tens of thousands of requests per second for production-critical enterprise applications.

Speed of response turned a missed window into a market-leading position.

The Team Behind It All

Ask Nikunj what he is most proud of at TrueFoundry, and he doesn't mention the product metrics or the customer list. He talks about the team.

The company started with three co-founders who had grown up together Nikunj, Anurag, and Abhishek whose complementary backgrounds in ML, infrastructure, and strategy gave the company the right shape from day one. Abhishek had led Meta's video infrastructure organization. Anurag had used machine learning to build trading strategies at WorldQuant and led geographic expansions for the firm. Together, they brought the technical depth and operational range that building enterprise infrastructure demands.

Now approaching 90 to 100 people, TrueFoundry still requires every new hire to go through a founder interview. The criteria haven't changed: hard skills, yes, but more importantly, genuine alignment with the mission and the kind of ownership mindset that makes early-stage companies work.

"We think that's the highest value a founder can create in a company," Nikunj said.

Where This Is All Going

The near-term challenge Nikunj sees for the industry is moving from hundreds of small, low-risk AI agents, personal productivity tools and functional experiments to agents that sit in the critical path of real business operations. That transition requires a level of control, reliability, and governance that most enterprises haven't built yet.

TrueFoundry's longer-term ambition is even larger. The analogy Nikunj reaches for is Databricks or Snowflake- companies that unlock value by centralizing an organization's data. TrueFoundry wants to do the same for compute. A single platform where agents are developed, deployed, and orchestrated. A central control plane for all the compute flowing through an enterprise's AI systems.

It's a big vision. But it's one that was defined clearly in 2022 and has only become more relevant since.

Listen to the full conversation with Nikunj Bajaj on the Code Story Podcast, Season 12, Episode 16, available on Spotify and Apple Podcast.

TrueFoundry AI Gateway offre une latence d'environ 3 à 4 ms, gère plus de 350 RPS sur 1 processeur virtuel, évolue horizontalement facilement et est prête pour la production, tandis que LiteLM souffre d'une latence élevée, peine à dépasser un RPS modéré, ne dispose pas d'une mise à l'échelle intégrée et convient parfaitement aux charges de travail légères ou aux prototypes.

Conçu pour la vitesse : latence d'environ 10 ms, même en cas de charge

Planifiez votre démo dès maintenant