تنسيق وحدات معالجة الرسوميات (GPU) متعددة السحابات باستخدام TrueFoundry: بنية مرجعية لمزودي السحابات الكبيرة (Hyperscalers) والسحابات المتخصصة

Published: July 4, 2026

Built for Speed: ~10ms Latency, Even Under Load

Blazingly fast way to build, track and deploy your models!

Handles 350+ RPS on just 1 vCPU — no tuning needed
Production-ready with full enterprise support

Get Started with Truefoundry Now Talk to the Expert

GPU capacity is one of the hardest constraints for AI teams right now. Provisioning Amazon EC2 P5 instances or Azure ND H100 v5-series VMs can run into quota limits, regional capacity constraints, or commercial commitments that are difficult to absorb. That has made specialized GPU clouds — CoreWeave, Lambda, Fluidstack, and several others — viable production targets, not just overflow capacity. Each of these providers offers a managed Kubernetes path for GPU workloads: CoreWeave Kubernetes Service (CKS), Lambda Managed Kubernetes (MK8s) on a 1-Click Cluster, and Fluidstack managed Kubernetes.

Running across these alongside a hyperscaler footprint creates real operational complexity: separate dashboards, separate identity systems, separate observability, separate deployment flows. TrueFoundry's role here is to attach each of these Kubernetes clusters to a single Control Plane, present them as deploy targets in one UI, and provide a consistent K8s operational layer on top — without replacing the cluster-native automation each provider already ships.

This post walks through what that actually looks like: the architecture, what attaching a cluster requires, where the platform's automation ends and the provider's begins, and the practical patterns we recommend.

The Architecture: One Control Plane, Many Compute Planes

TrueFoundry uses a split-plane architecture. The Control Plane (TrueFoundry-managed or self-hosted) holds metadata, RBAC, deployment manifests, and the UI. The Compute Plane is your own Kubernetes cluster. Multiple Compute Planes can connect to a single Control Plane — meaning an EKS cluster, an AKS cluster, a CKS cluster, an MK8s cluster, and a Fluidstack managed K8s cluster can all report to the same dashboard.

The tfy-agent runs in each cluster and opens a secure outbound WebSocket to the Control Plane. The agent streams cluster state, while the agent proxy lets the Control Plane apply Kubernetes changes through that outbound connection without requiring an inbound endpoint on the cluster. Workloads, data-plane traffic, and provider credentials remain inside each cluster's cloud account, and traffic does not flow between Compute Planes through the Control Plane — they remain independent clouds with independent identities.

Figure 1. Multi-cloud architecture. One Control Plane, one outbound agent connection per cluster, and no data-plane traffic between clusters through the platform. Each Compute Plane keeps its own identity boundary, storage, and provider-managed components.

What "Attaching a Cluster" Actually Requires

The agent install itself is a single helm command — but it sits at the end of a setup that has real prerequisites. For any cluster (hyperscaler or specialized) to attach cleanly, the cluster needs:

Kubernetes 1.28+ with headroom for roughly 250 nodes / 4,096 pods, depending on the intended workload profile
Outbound egress to the container registries TrueFoundry pulls from: public.ecr.aws, quay.io, ghcr.io, tfy.jfrog.io, docker.io/natsio, nvcr.io, registry.k8s.io
A wildcard domain (e.g. *.lambda-pool.example.com) and a TLS certificate — cert-manager + Let's Encrypt is the documented pattern
A working load balancer or ingress path. Managed hyperscale K8s usually provides this through the cloud load balancer integration; on bare-metal or specialized clusters, confirm the provider-supported ingress and IP allocation path
Persistent storage support for volumes and artifacts, typically through the provider's CSI-backed block, filesystem, or object-storage integrations
A reachable container registry and artifact store for image pulls, build outputs, and workflow artifacts
Node labels on generic / specialized clusters: truefoundry.com/nodepool=<pool-name> on every node, and truefoundry.com/gpu_type=<GPU_TYPE> on GPU nodes (TrueFoundry auto-discovers node pools on EKS/GKE/AKS only)

On managed hyperscale K8s, TrueFoundry's OpenTofu/Terraform modules cover most of this. On specialized clouds, you use the provider's managed K8s offering directly — provision through their console, prepare the prerequisites, then attach. The exact agent install command is generated by the platform UI when you click Attach Existing Cluster; it typically follows this shape:

helm repo add truefoundry https://truefoundry.github.io/infra-charts/
helm upgrade --install tfy-agent truefoundry/tfy-agent \
  --set tenantName=my-org \
  --set clusterName=lambda-h100-pool \
  --set controlPlaneURL=https://<YOUR_CONTROL_PLANE> \
  --set clusterTokenSecret=<YOUR_CLUSTER_TOKEN_SECRET>

Specialized Cloud Specifics: the Addon-Overlap Problem

This is the part many architecture posts gloss over. The specialized clouds named earlier already provide Kubernetes components that overlap with parts of TrueFoundry's default addon stack. If both sides install the same component, you can get conflicts that range from duplicated dashboards to unsupported GPU operator deployments.

CoreWeave is explicit about this: CoreWeave manages the NVIDIA GPU Operator on CKS clusters and warns against double-installing it. The platform-managed deployment is the only supported one. Disable TrueFoundry's GPU Operator addon when attaching a CKS cluster.

Concretely:

CoreWeave CKS includes a CoreWeave-managed NVIDIA GPU Operator on recent clusters, Cilium networking, storage integrations, DPU-based infrastructure, and CoreWeave observability. When attaching, disable TrueFoundry's GPU Operator addon and review any observability overlap.
Lambda MK8s provides GPU and InfiniBand/RDMA support, shared persistent storage through the lambda-shared StorageClass, NVIDIA DCGM Grafana dashboards, and automated node remediation. Disable TrueFoundry's GPU Operator addon if Lambda is already managing GPU enablement. The provider's DCGM dashboard is separate from TrueFoundry's observability and can run alongside it.
Fluidstack managed Kubernetes advertises support for GPU Operator and Network Operator, Ray, Volcano, and Kueue for batch scheduling, Atlas-managed storage, and cluster-health observability. Disable TrueFoundry's GPU Operator addon when the provider is already managing it. The provider's batch scheduling stack is complementary to, rather than a replacement for, workflow orchestration.

The Attach Existing Cluster form has a Cluster Addons section where you toggle off any addon the provider already supplies. This is a one-time decision per cluster.

What TrueFoundry Adds Across Clusters

Once a cluster is attached, the platform layers a consistent operational experience on top of whatever K8s the provider gave you:

One UI for every cluster. Every deployment, every job, every service, every workspace — visible across all attached clusters in the same dashboard.
Consistent deployment manifest format. Author a service or job once; target a different cluster by changing the cluster_name field — provided the prerequisites on the destination cluster match (matching GPU type, registry access, secrets, storage class).
GitOps-versioned delivery via ArgoCD يتم نشرها في كل مجموعة، مع تخزين تهيئة النشر في Git عند تمكين GitOps.
قابلية المراقبة لكل مجموعة، مع عرض المقاييس المستندة إلى Prometheus في واجهة مستخدم لوحة التحكم و Grafana اختياريًا للوحات معلومات أعمق على مستوى المجموعة. (ملاحظة: هذه رؤية تشغيلية موحدة، وليست بديلاً لخلفية مقاييس موحدة طويلة الأجل.)
سير عمل Argo في كل مجموعة لمهام الدفعات وتشغيل التدريب، مع عرض سجل التشغيل وقابلية المراقبة على مستوى الخطوة بشكل موحد.
التحجيم التلقائي للخدمات داخل كل مجموعة، بما في ذلك التحجيم القائم على معدل الطلب، والقواعد المستندة إلى الوقت، والأنماط المستندة إلى قائمة الانتظار، والتحجيم إلى الصفر لأعباء العمل المناسبة.
تحديد موضع نوع السعة داخل المجموعة: يمكن لأعباء العمل استهداف سعة فورية أو عند الطلب أو فورية مع سعة احتياطية عند الطلب حيث تدعمها إعدادات توفير السحابة والعقدة الأساسية.
التحكم في الوصول المستند إلى الدور (RBAC) وتسجيل الدخول الموحد (SSO) القائم على مساحة العمل على مستوى النظام الأساسي، مع نماذج أذونات متسقة بغض النظر عن المجموعة التي تعمل فيها أعباء العمل.

ما يفعله TrueFoundry لا يفعله (بعد)

لتوضيح النطاق تمامًا — هذه قدرات حقيقية يطلبها العملاء، لكنها ليست ميزات للنظام الأساسي اليوم:

الجدولة عبر المجموعات. عند نشر مهمة، فإنك تستهدف مجموعة محددة. لا يختار النظام الأساسي تلقائيًا المجموعة الأقل تكلفة، أو يوجه بناءً على السعة في الوقت الفعلي، أو يعيد موازنة أعباء العمل الجارية عبر المجموعات.
تجاوز الفشل عبر المجموعات. إذا واجهت Lambda 1CC مشكلات في الأجهزة أو نفدت سعة منطقة CoreWeave، فإن النظام الأساسي لا يعيد محاولة المهمة تلقائيًا على حجز EKS. يمكن أن تساعد سياسات التحديد الموضعي داخل المجموعة والاحتياطي عندما تدعمها المجموعة الأساسية؛ تجاوز الفشل عبر المجموعات مشكلة مختلفة ولم يتم توفيره.
تجميع سعة وحدات معالجة الرسوميات المجمعة. يتم تتبع سعة H100 الخاصة بك على CoreWeave وLambda وAWS كـ ثلاث مجموعات منفصلة — وليس كحصة مجمعة واحدة. تعرض واجهة المستخدم الثلاثة جميعًا؛ ويتعامل المجدول معها بشكل مستقل.
التوجيه التلقائي الذي يراعي التكلفة. إن اختيار مكان تشغيل مهمة بناءً على تسعير المزود في الوقت الفعلي ليس ميزة للمنصة اليوم.

إذا كانت حالة استخدامك تتطلب حقًا تنسيقًا عبر المجموعات — على سبيل المثال، مهمة تدريب يجب أن تنتقل إلى أرخص سحابة متاحة — فاتخذ هذا القرار في طبقة التنسيق (منطق CI/CD الخاص بك، أو مجدول مخصص صغير، أو أداة سير عمل مثل Argo Workflows أو Temporal) فوق واجهة برمجة تطبيقات النشر لكل مجموعة من TrueFoundry. توفر لك المنصة أساسيات نشر متسقة عبر المجموعات؛ ويبقى قرار التوجيه لديك.

سير عمل حقيقي: إرفاق مجموعة Lambda بنقرة واحدة

تسلسل واضح لضبط التوقعات بشكل صحيح:

قم بتوفير مجموعة بنقرة واحدة على Lambda مع تمكين Kubernetes المُدار. اختر الحجم (من 16 إلى أكثر من 2000 وحدة معالجة رسومية) ومدة الحجز.
احصل على وصول مسؤول المجموعة عبر تدفق مصادقة Lambda وتحقق من أن kubectl get nodes يظهر عمال GPU الخاصين بك على أنهم جاهز.
قم بإعداد المتطلبات المسبقة في المجموعة: قم بتكوين DNS ذي البدل ومسار الدخول، وتثبيت أو التحقق من أتمتة شهادة TLS، وتأكيد أن lambda-shared StorageClass موجود إذا كنت بحاجة إلى تخزين مشترك، والتحقق من خروج البيانات إلى سجلات الحاويات المطلوبة لـ TrueFoundry.
في TrueFoundry، انقر على "إرفاق مجموعة موجودة" واملأ تفاصيل المجموعة. تعطيل إضافة مشغل وحدة معالجة الرسوميات (GPU Operator) عندما توفر Lambda بالفعل تمكين وحدة معالجة الرسوميات (GPU).
نفّذ أمر helm الذي تم إنشاؤه في مجموعتك. انتظر حتى يتم تشغيل الوكيل وأي إضافات مختارة — يستغرق ذلك عادةً 5-10 دقائق بعد توفر المتطلبات الأساسية.
قم بتكوين التسامحات في مساحة عمل TrueFoundry التي تستهدف هذه المجموعة. تحتوي عقد MK8s على تلوثات GPU (GPU taints) تتطلب التسامح معها؛ ويطبق تكوين التسامح على مستوى مساحة العمل هذه التلوثات على كل مهمة يتم إرسالها إلى تلك المجموعة.
تحقق أن المجموعة تظهر متصلة، ثم انشر مهمة اختبار صغيرة (خدمة vLLM بوحدة معالجة رسوميات واحدة أو تشغيل تدريب بعقدة واحدة) قبل نقل أعباء عمل الإنتاج.

يستغرق الوقت الإجمالي على مجموعة مُجهزة عادةً 30-60 دقيقة، ويعود ذلك بشكل أساسي إلى انتشار DNS وإصدار الشهادات وتثبيت helm.

مقارنة عملية

Capability	Hyperscalers (EKS / AKS / GKE)	Specialized (CKS / MK8s / Fluidstack)	What TrueFoundry adds
GPU availability	Capacity quotas, reservation planning, and regional availability constraints	Bare-metal H100 / H200 / B200 / GB200 capacity, subject to provider availability and commercial terms	All attached clusters visible in one UI; deploy targets selectable per workload
Pricing	Published rates, reserved discounts	Provider-specific pricing, often optimized for large GPU clusters	Per-cluster deployment; cross-cluster cost routing is not platform-native
Spot / preemption	Cloud-native spot, on-demand, and reservation constructs; fallback depends on node provisioning setup	Provider-managed (varies)	Configured per cluster; cross-cluster failover is user-side
Storage	Native EBS / EFS, Persistent Disk / Filestore, Azure Disk / Azure Files, plus object-store integrations where configured	Provider-managed storage integrations such as CoreWeave storage, Lambda shared filesystem, and Fluidstack Atlas-managed storage	Use the provider-supported Kubernetes storage abstraction; validate access modes and performance per workload
GPU drivers / Operator	Installed by TrueFoundry's GPU Operator addon when the cluster does not already provide it	Often provider-managed — disable TrueFoundry's addon when the provider already manages GPU enablement	Per-cluster addon toggle in the attach form
Observability	Per-cluster Prometheus + Grafana surfaced in platform UI	TrueFoundry observability plus provider-native dashboards where available, such as Lambda's DCGM Grafana	Consolidated cluster visibility (not federated long-term metrics)
Identity	EKS IRSA / GKE Workload Identity / Azure WI configured per provider's standard	Provider-specific RBAC and IAM	Workspace-based RBAC and SSO at the platform layer; native K8s RBAC stays per-cluster

خاتمة

استراتيجية وحدات معالجة الرسوميات (GPU) متعددة السحابات حقيقية، وتزداد جدواها مع تحول القدرة الحاسوبية إلى قيد على خرائط طريق الذكاء الاصطناعي. يتمثل المسار العملي في التعامل مع كل مجموعة — سواء كانت EKS أو AKS أو GKE أو CKS أو MK8s أو Kubernetes المُدار من Fluidstack — كملحق Kubernetes قياسي، مع توفير المنصة لطبقة تشغيل متسقة فوقها.

ما تحصل عليه: واجهة مستخدم واحدة عبر كل مجموعة قمت بربطها، وأنماط نشر وGitOps متسقة بغض النظر عن السحابة، وقابلية نقل أعباء العمل على مستوى البيان (حيث تتطابق المتطلبات الأساسية)، وفصل كامل للبيانات عبر حسابات العملاء السحابية. ما لا تحصل عليه اليوم: الجدولة التلقائية عبر المجموعات، أو تجميع السعة، أو التوجيه المدرك للتكلفة — تبقى هذه القرارات معك، مبنية على واجهة برمجة تطبيقات النشر لكل مجموعة الخاصة بالمنصة.

إذا كنت تقيّم هذه الحزمة التقنية، فإن نقطة البداية الطبيعية هي ربط مجموعتين — واحدة من مزود سحابي كبير (hyperscaler) وواحدة متخصصة — وتشغيل مهمة تمثيلية على كل منهما للتحقق من سهولة التشغيل قبل التوسع.

‍

TrueFoundry AI Gateway delivers ~3–4 ms latency, handles 350+ RPS on 1 vCPU, scales horizontally with ease, and is production-ready, while LiteLLM suffers from high latency, struggles beyond moderate RPS, lacks built-in scaling, and is best for light or prototype workloads.

Built for Speed: ~10ms Latency, Even Under Load

Schedule your Demo Now