Prerequisite
- Ensure that you have a custom registry setup auth.
- Below is the list of images which are to be pushed in the registry.
Copy
Ask AI
quay.io/argoproj/argo-rollouts:v1.8.2
quay.io/argoproj/workflow-controller:v3.6.5
quay.io/argoproj/argocli:v3.6.5
quay.io/argoproj/argocd:v2.14.10
public.ecr.aws/docker/library/redis:7.4.2-alpine
tfy.jfrog.io/tfy-private-images/deltafusion-ingestor:v0.84.0
tfy.jfrog.io/tfy-private-images/deltafusion-query-server:v0.84.1
tfy.jfrog.io/tfy-private-images/mlfoundry-server:v0.84.0
tfy.jfrog.io/tfy-private-images/servicefoundry-server:v0.84.3
tfy.jfrog.io/tfy-private-images/sfy-manifest-service:v0.79.0
tfy.jfrog.io/tfy-private-images/tfy-controller:v0.83.0
tfy.jfrog.io/tfy-private-images/tfy-k8s-controller:v0.84.2
tfy.jfrog.io/tfy-private-images/tfy-llm-gateway:v0.84.1
tfy.jfrog.io/tfy-private-images/tfy-otel-collector:dc2bad620009411812aba0fb7b90aec8d4874b0c
tfy.jfrog.io/tfy-private-images/truefoundry-frontend-app:v0.84.1
tfy.jfrog.io/tfy-images/sfy-builder:v0.8.19
tfy.jfrog.io/tfy-images/tfy-buildkit:0.1.2
tfy.jfrog.io/tfy-images/truefoundry-bootstrap:0.1.5
tfy.jfrog.io/tfy-mirror/bitnami/postgresql:16.6.0-debian-12-r2
tfy.jfrog.io/tfy-mirror/nats:2.11.6-alpine
tfy.jfrog.io/tfy-mirror/natsio/nats-server-config-reloader:0.18.2
tfy.jfrog.io/tfy-mirror/natsio/prometheus-nats-exporter:0.17.3
tfy.jfrog.io/tfy-images/sds-server:c3bb65485f56faaa236f4ee02074c6da7ab269a8
tfy.jfrog.io/tfy-images/tfy-agent:2618fdfb58371b083c8eacabc4afba4b66c8696e
tfy.jfrog.io/tfy-images/tfy-agent-proxy:9b1cee43f0aae843c11bcc97c4d1e7f565d913dd
- Ensure the following helm charts are also pushed in your custom registry as an OCI chart
Copy
Ask AI
- repo: https://argoproj.github.io/argo-helm/
chart: argo-cd
version: 7.8.26
- repo: https://argoproj.github.io/argo-helm/
chart: argo-rollouts
version: 2.39.5
- repo: https://argoproj.github.io/argo-helm/
chart: argo-workflows
version: 0.45.12
- repo: https://truefoundry.github.io/infra-charts/
chart: truefoundry
version: 0.80.6
- repo: https://truefoundry.github.io/infra-charts/
chart: tfy-agent
version: 0.2.80
Installation Steps
- Install argocd with custom values file
Copy
Ask AI
global:
image:
repository: <CUSTOM_REGISTRY>/argoproj/argocd
redis:
image:
repository: <CUSTOM_REGISTRY>/library/redis
server:
extraArgs:
- "--insecure"
- '--application-namespaces="*"'
controller:
extraArgs:
- '--application-namespaces="*"'
notifications:
enabled: false
dex:
enabled: false
Copy
Ask AI
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update argo
helm upgrade --install argocd argo/argo-cd \
--namespace argocd \
--create-namespace \
--version 7.8.26 \
-f argocd-values.yaml
Copy
Ask AI
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
annotations:
argocd.argoproj.io/sync-wave: "-1"
helm.sh/hook: pre-install
helm.sh/hook-weight: "-20"
labels:
truefoundry.com/infra-component: "argocd"
spec:
destination:
namespace: argocd
server: https://kubernetes.default.svc
project: default
source:
chart: argo-cd
repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/argoproj/argo-helm
targetRevision: 7.8.26
helm:
values: |-
global:
image:
repository: <CUSTOM_REGISTRY>/argoproj/argocd
redis:
image:
repository: <CUSTOM_REGISTRY>/library/redis
extraObjects:
- apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: tfy-apps
spec:
clusterResourceWhitelist:
- group: '*'
kind: '*'
destinations:
- namespace: '*'
server: '*'
sourceRepos:
- '*'
sourceNamespaces:
- "*"
notifications:
enabled: false
dex:
enabled: false
configs:
cm:
resource.customizations.ignoreDifferences.storage.k8s.io_CSIDriver: |
jqPathExpressions:
- '.spec.seLinuxMount'
applicationSet:
resources:
requests:
cpu: 100m
memory: 50Mi
ephemeral-storage: 256Mi
limits:
cpu: 200m
memory: 100Mi
ephemeral-storage: 512Mi
server:
extraArgs:
- --insecure
- '--application-namespaces="*"'
resources:
requests:
cpu: 200m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 400m
memory: 1024Mi
ephemeral-storage: 512Mi
controller:
metrics:
enabled: true
serviceMonitor:
enabled: true
extraArgs:
- '--application-namespaces="*"'
resources:
requests:
cpu: 1.3
memory: 550Mi
ephemeral-storage: 256Mi
limits:
cpu: 2.6
memory: 1100Mi
ephemeral-storage: 512Mi
redis:
resources:
requests:
cpu: 100m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 400m
memory: 512Mi
ephemeral-storage: 512Mi
redisSecretInit:
resources:
requests:
cpu: 100m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 400m
memory: 512Mi
ephemeral-storage: 512Mi
repoServer:
resources:
requests:
cpu: 0.6
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 1.2
memory: 1024Mi
ephemeral-storage: 512Mi
syncPolicy:
automated: {}
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- Install argo-rollouts helm chart
Copy
Ask AI
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: argo-rollout
finalizers:
- resources-finalizer.argocd.argoproj.io
labels:
truefoundry.com/infra-component: "argo-rollout"
spec:
destination:
namespace: argo-rollouts
server: https://kubernetes.default.svc
project: tfy-apps
source:
chart: argo-rollouts
repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/argoproj/argo-helm
targetRevision: 2.39.5
helm:
values: |-
controller:
image:
registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
metrics:
enabled: true
serviceMonitor:
enabled: true
resources:
requests:
cpu: 100m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 200m
memory: 512Mi
ephemeral-storage: 512Mi
syncPolicy:
automated: {}
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- Install argo-workflows helm chart
Copy
Ask AI
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: argo-workflows
finalizers:
- resources-finalizer.argocd.argoproj.io
labels:
truefoundry.com/infra-component: "argo-workflows"
spec:
destination:
namespace: argo-workflows
server: https://kubernetes.default.svc
project: tfy-apps
source:
chart: argo-workflows
repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/argoproj/argo-helm
targetRevision: 0.45.12
helm:
values: |-
controller:
image:
registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
serviceMonitor:
enabled: true
workflowDefaults:
spec:
activeDeadlineSeconds: 432000
ttlStrategy:
secondsAfterCompletion: 3600
metricsConfig:
enabled: true
resources:
requests:
cpu: 100m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 400m
memory: 512Mi
ephemeral-storage: 512Mi
executor:
image:
registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
resources:
requests:
cpu: 100m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 400m
memory: 512Mi
ephemeral-storage: 512Mi
server:
image:
registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
resources:
requests:
cpu: 100m
memory: 256Mi
ephemeral-storage: 256Mi
limits:
cpu: 400m
memory: 512Mi
ephemeral-storage: 512Mi
syncPolicy:
automated: {}
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- Install TrueFoundry helm chart
Copy
Ask AI
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: truefoundry
finalizers:
- resources-finalizer.argocd.argoproj.io
labels:
truefoundry.com/infra-component: "truefoundry"
spec:
destination:
namespace: truefoundry
server: https://kubernetes.default.svc
project: tfy-apps
source:
targetRevision: 0.84.8
repoURL: "<CUSTOM_REGISTRY>/<HELM_NS>/tfy-helm"
chart: truefoundry
helm:
values: |-
global:
image:
registry: <CUSTOM_REGISTRY>
tenantName: <TENANT_NAME>
controlPlaneURL: https://<CONTROL_PLANE_HOST>
tfyApiKey: <TFY_API_KEY>
database:
host: <DATABASE_HOST>
name: <DATABASE_NAME>
username: <DATABASE_USERNAME>
password: <DATABASE_PASSWORD>
config:
defaultCloudProvider: aws
storageConfiguration:
awsS3BucketName: ""
awsRegion: ""
awsAccessKeyId: ""
awsSecretAccessKey: ""
awsEndpointURL: ""
tags:
llmGateway: true
llmGatewayRequestLogging: true
tfy-clickhouse:
enabled: false
tfy-buildkitd-service:
enabled: false
tfy-configs:
configs:
imageMutationPolicyOverride:
images:
- source_registry: tfy.jfrog.io/tfy-images
destination_registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
- source_registry: tfy.jfrog.io/tfy-mirror
destination_registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
- source_registry: public.ecr.aws/truefoundrycloud
destination_registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
extraResources:
enabled: true
manifests:
- apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: truefoundry-truefoundry-frontend-app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
ingressClassName: nginx
rules:
- host: <CONTROL_PLANE_HOST>
http:
paths:
- path: /
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-truefoundry-frontend-app
port:
number: 5000
- apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: truefoundry-truefoundry-backend-app-ingress
annotations:
nginx.ingress.kubernetes.io/use-regex: 'true'
nginx.ingress.kubernetes.io/rewrite-target: /$2
spec:
ingressClassName: nginx
rules:
- host: <CONTROL_PLANE_HOST>
http:
paths:
- path: /api/svc/socket\.io(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-servicefoundry-server
port:
number: 3000
- path: /api/llm(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-tfy-llm-gateway
port:
number: 8787
- path: /api/proxy-server(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-tfy-controller
port:
number: 8123
- path: /api/s3proxy(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-s3proxy
port:
number: 8080
- path: /api/otel(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-tfy-otel-collector
port:
number: 4318
- path: /flyteidl\.service\.AdminService(/|$)(.*)
pathType: ImplementationSpecific
backend:
service:
name: truefoundry-tfy-workflow-admin-server
port:
number: 8089
syncPolicy:
automated: {}
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- Set up the ingress for the control plane.
- Log in to the TrueFoundry UI and head over to the platform section from the left sidebar.
- Click on ‘Attach Existing Cluster
and selectGeneric Cluster`. - Fill the form by entering the cluster name and deselect all the components as we have already installed the required ones.
- Skip the helm installation steps and close the form.
- A cluster entry must be created in the UI and you can use the three dots from the cluster column to capture the cluster token. The token will be used the next application.
- Install tfy-agent application
Copy
Ask AI
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: tfy-agent
finalizers:
- resources-finalizer.argocd.argoproj.io
labels:
truefoundry.com/infra-component: "tfy-agent"
spec:
destination:
namespace: tfy-agent
server: https://kubernetes.default.svc
project: tfy-apps
source:
targetRevision: 0.2.80
repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/tfy-agent
chart: tfy-agent
helm:
values: |-
config:
clusterToken: <CLUSTER_TOKEN>
tenantName: <TENANT_NAME>
controlPlaneURL: https://<CONTROL_PLANE_HOST>
tfyAgent:
image:
repository: <CUSTOM_REGISTRY>/<REGISTRY_NS>/tfy-agent
resources:
limits:
cpu: 500m
memory: 1024Mi
ephemeral-storage: 256Mi
requests:
cpu: 300m
memory: 512Mi
ephemeral-storage: 128Mi
tfyAgentProxy:
image:
repository: <CUSTOM_REGISTRY>/<REGISTRY_NS>/tfy-agent-proxy
resources:
limits:
cpu: 500m
memory: 512Mi
ephemeral-storage: 256Mi
requests:
cpu: 50m
memory: 128Mi
ephemeral-storage: 128Mi
sdsServer:
image:
repository: <CUSTOM_REGISTRY>/<REGISTRY_NS>/sds-server
resources:
limits:
cpu: 200m
ephemeral-storage: 20M
memory: 50M
requests:
cpu: 100m
ephemeral-storage: 10M
memory: 30M
syncPolicy:
automated: {}
syncOptions:
- CreateNamespace=true
- ServerSideApply=true
- Ensure that your cluster is connected once the pods come up.
-
Configure nodes to integrate with TrueFoundry
- CPU Nodes for Infra components - We require few CPU nodes for running compute plane infra components. No additional configuration needed for these.
- CPU Nodes for workloads - You can configure few CPU nodes to deploy CPU workloads using TrueFoundry. Please add following labels to these nodes:
CopyAsk AItruefoundry.com/nodepool: cpu-pool # unique identifier for cpu nodes- GPU Nodes for workloads - You can configure few GPU nodes to deploy GPU workloads using TrueFoundry. Please add following labels to these nodes:
CopyAsk AItruefoundry.com/nodepool: a100-gpu-pool # unique identifier for specific gpu type nodes truefoundry.com/gpu_type: A10G # Possible values given below.
Possible Values for truefoundry.com/gpu_type:
- A10G
- A10_12GB
- A10_24GB
- A10_4GB
- A10_8GB
- A100_40GB
- A100_80GB
- H100_80GB
- H100_94GB
- H200
- L4
- L40S
- P100
- P4
- T4
- V100
- From the platform page, click on the sync icon on the cluster entry to sync the cluster with the control plane.