Architecting TrueFoundry on Azure: AKS & Control Plane Integration

Wiring up a Generative AI platform on Microsoft Azure means stitching together distinct compute, identity, and AI primitives. You provision raw capacity via Azure Kubernetes Service (AKS) and Spot VMs, handle identity via Entra ID, and route requests to Azure OpenAI. Friction hits when your infrastructure teams have to orchestrate these connections manually for every new model deployment.

TrueFoundry deploys as an infrastructure overlay inside your Azure subscription. We handle the deployment lifecycle, identity federation, and autoscaling. This post breaks down the exact integration patterns we use to connect TrueFoundry with Azure, covering the split-plane deployment, network boundaries, and workload identity mechanics.

Deployment Model: Split-Plane Architecture

We use a split-plane architecture to isolate workload execution from platform management. If you build platforms on Amazon EKS, this model looks familiar: you separate the control surface from the data plane.

The Control Plane: Acts as the API server and metadata store. It holds deployment manifests, RBAC configurations, and telemetry data.
The Compute Plane: Runs inside your AKS cluster. It consists of the TrueFoundry agent, local controllers, and your actual model weights and GPUs.

We connect the two planes using a secure, outbound-only gRPC stream or WebSocket. The cluster-side agent initiates the connection to the Control Plane to pull manifests and push logs. You do not open any inbound ports on your VNET Network Security Groups. Your VNET denies external ingress from the internet by default.

‍

Fig 1: The Split-Plane Architecture isolates data processing within the customer VNET.

Networking Topology and Traffic Flow

We configure the compute plane networking using Azure CNI for direct pod-level IP assignment. Your compute resources stay in private subnets.

Ingress and Egress

Inbound Traffic: Application traffic hits an Azure Application Gateway or standard internal Load Balancer. The gateway terminates TLS and passes the traffic to the Istio Ingress Gateway running inside AKS.
Outbound Traffic: AKS worker nodes route outbound calls through an Azure NAT Gateway. They use this path to pull images from Azure Container Registry and poll the Control Plane.

Private Endpoint Integration

For strict compliance boundaries, we route traffic over Azure Private Link. Connections from your inference pods to Azure OpenAI, Key Vault, and Blob Storage route entirely over the Microsoft backbone.

‍

Fig 2: Network traffic flow detailing ingress and private connectivity to Azure PaaS.

Identity Federation: Entra Workload ID

Hardcoded static secrets and Service Principals introduce severe rotation overhead. We authenticate workloads dynamically using Microsoft Entra Workload ID. If you manage AWS environments, this is the Azure equivalent of AWS IAM Roles for Service Accounts (IRSA).

When you deploy a pipeline, we execute this sequence:

Service Account Creation: We provision a Kubernetes Service Account in the workload's namespace.
Federation: We link this Service Account to a user-assigned Managed Identity in Entra ID.
Token Exchange: The pod requests a signed token from the AKS OIDC Issuer. The Azure SDK swaps this token for an Entra access token via the OpenID Connect endpoint.
Resource Access: The pod uses this token to pull models from Blob Storage or hit Azure OpenAI.

We use DefaultAzureCredential in the application code. This scopes the blast radius strictly to the RBAC permissions granted to that specific Managed Identity.

‍

Fig 3: The Entra Workload ID authentication flow.

Compute Orchestration: Spot VM Integration

Running steady-state inference on On-Demand VMs often results in higher baseline costs. We integrate directly with AKS node pools to orchestrate Azure Spot Virtual Machines (similar to utilizing Amazon EC2 Spot Instances).

We manage Spot capacity using the following logic:

Provisioning: We build secondary node pools with priority=Spot and eviction-policy=Delete.
Eviction Handling: Our controller polls the Azure Instance Metadata Service. When we detect an eviction notice (a 30-second warning), we cordon the node and trigger the Kubernetes Cluster Autoscaler to reschedule the pod to an On-Demand fallback node.

For teams running batch inference or fault-tolerant API serving, this setup—much like running Karpenter on AWS—can reduce compute instance costs by up to 80% depending on workload flexibility.

The AI Gateway: Unifying Models

Managing distinct API keys and Token-Per-Minute (TPM) limits across multiple Azure regions creates operational drag. The TrueFoundry AI Gateway abstracts this. Similar to routing requests through Amazon Bedrock, developers hit a single internal API endpoint.

Smart Routing: We load balance requests across Azure regions. If East US rate-limits your request, the Gateway retries against West Europe.
Failover: If an Azure PaaS outage occurs, the Gateway can fail traffic over to a Llama 3 or Mistral instance hosted directly on your AKS compute plane.

Infrastructure as Code Compatibility

We align with standard GitOps and IaC practices. You provision the underlying Azure environment using our maintained Terraform modules.

Your Terraform state manages the VNETs, the AKS cluster, the OIDC issuers, and the underlying PostgreSQL databases. The TrueFoundry overlay simply maps to these native resources, keeping your infrastructure auditable and compliant.

Operational Comparison

Task	Native Azure Implementation	Azure + TrueFoundry Implementation
Deploy Open-Source Model	Build container, push to ACR, write AKS manifests, configure Ingress routing, configure HPA.	Select model from catalog, define GPU constraints. Controller generates manifests and executes deployment.
Spot VM Management	Provision Spot Node Pools. Maintain custom polling logic against IMDS to handle evictions and rescheduling.	Enable Spot configuration. Controller executes IMDS polling, node cordoning, and scheduling fallback to On-Demand capacity.
Azure OpenAI Access	Manage discrete keys per region. Implement application-level retry logic to handle region-specific TPM limits.	Route via AI Gateway. Provides multi-region load balancing, automated retries on 429s, and centralized telemetry.
Secret Management	Deploy Secrets Store CSI Driver and configure SecretProviderClass manifests per application.	Inject environment variables referencing Azure Key Vault URIs directly via UI or CLI.

Summary

Deploying TrueFoundry on Azure isolates your compute and data execution while we manage the application lifecycle. You maintain direct authority over your VNETs, NSGs, and data residency perimeters. We handle the orchestration. By abstracting the complex wiring between AKS, Entra ID, and Azure OpenAI, we let your engineering teams focus on shipping models instead of fighting infrastructure.

‍

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now

The fastest way to build, govern and scale your AI

Book a Demo

Architecting TrueFoundry on Azure: Control Plane and Compute Integration

Deployment Model: Split-Plane Architecture

Networking Topology and Traffic Flow

Ingress and Egress

Private Endpoint Integration

Identity Federation: Entra Workload ID

Compute Orchestration: Spot VM Integration

The AI Gateway: Unifying Models

Infrastructure as Code Compatibility

Operational Comparison

Summary

Built for Speed: ~10ms Latency, Even Under Load

Architecting TrueFoundry on Azure: Control Plane and Compute Integration

Understanding Portkey AI Gateway Pricing For 2026: Complete Guide & Comparison

How should Enterprises evaluate LLM Gateway for Scale?

LLM in Enterprise : A Complete Guide

The Complete Guide to AI Gateways and MCP Servers

Architecting TrueFoundry on Azure: Control Plane and Compute Integration

Deployment Model: Split-Plane Architecture

Networking Topology and Traffic Flow

Ingress and Egress

Private Endpoint Integration

Identity Federation: Entra Workload ID

Compute Orchestration: Spot VM Integration

The AI Gateway: Unifying Models

Infrastructure as Code Compatibility

Operational Comparison

Summary

Built for Speed: ~10ms Latency, Even Under Load

Discover More

Architecting TrueFoundry on Azure: Control Plane and Compute Integration

Understanding Portkey AI Gateway Pricing For 2026: Complete Guide & Comparison

How should Enterprises evaluate LLM Gateway for Scale?

LLM in Enterprise : A Complete Guide

The Complete Guide to AI Gateways and MCP Servers

Subscribe to our newsletter