Skip to main content

Prerequisite

  1. Ensure that you have a custom registry setup auth.
  2. Below is the list of images which are to be pushed in the registry.
quay.io/argoproj/argo-rollouts:v1.8.2
quay.io/argoproj/workflow-controller:v3.6.5
quay.io/argoproj/argocli:v3.6.5
quay.io/argoproj/argocd:v2.14.10
public.ecr.aws/docker/library/redis:7.4.2-alpine
tfy.jfrog.io/tfy-private-images/deltafusion-ingestor:v0.84.0
tfy.jfrog.io/tfy-private-images/deltafusion-query-server:v0.84.1
tfy.jfrog.io/tfy-private-images/mlfoundry-server:v0.84.0
tfy.jfrog.io/tfy-private-images/servicefoundry-server:v0.84.3
tfy.jfrog.io/tfy-private-images/sfy-manifest-service:v0.79.0
tfy.jfrog.io/tfy-private-images/tfy-controller:v0.83.0
tfy.jfrog.io/tfy-private-images/tfy-k8s-controller:v0.84.2
tfy.jfrog.io/tfy-private-images/tfy-llm-gateway:v0.84.1
tfy.jfrog.io/tfy-private-images/tfy-otel-collector:dc2bad620009411812aba0fb7b90aec8d4874b0c
tfy.jfrog.io/tfy-private-images/truefoundry-frontend-app:v0.84.1
tfy.jfrog.io/tfy-images/sfy-builder:v0.8.19
tfy.jfrog.io/tfy-images/tfy-buildkit:0.1.2
tfy.jfrog.io/tfy-images/truefoundry-bootstrap:0.1.5
tfy.jfrog.io/tfy-mirror/bitnami/postgresql:16.6.0-debian-12-r2
tfy.jfrog.io/tfy-mirror/nats:2.11.6-alpine
tfy.jfrog.io/tfy-mirror/natsio/nats-server-config-reloader:0.18.2
tfy.jfrog.io/tfy-mirror/natsio/prometheus-nats-exporter:0.17.3
tfy.jfrog.io/tfy-images/sds-server:c3bb65485f56faaa236f4ee02074c6da7ab269a8
tfy.jfrog.io/tfy-images/tfy-agent:2618fdfb58371b083c8eacabc4afba4b66c8696e
tfy.jfrog.io/tfy-images/tfy-agent-proxy:9b1cee43f0aae843c11bcc97c4d1e7f565d913dd
  1. Ensure the following helm charts are also pushed in your custom registry as an OCI chart
- repo: https://argoproj.github.io/argo-helm/
  chart: argo-cd
  version: 7.8.26
- repo: https://argoproj.github.io/argo-helm/
  chart: argo-rollouts
  version: 2.39.5
- repo: https://argoproj.github.io/argo-helm/
  chart: argo-workflows
  version: 0.45.12
- repo: https://truefoundry.github.io/infra-charts/
  chart: truefoundry
  version: 0.80.6
- repo: https://truefoundry.github.io/infra-charts/
  chart: tfy-agent
  version: 0.2.80

Installation Steps

  1. Install argocd with custom values file
global:
  image:
    repository: <CUSTOM_REGISTRY>/argoproj/argocd
redis:
  image:
    repository: <CUSTOM_REGISTRY>/library/redis
server:
  extraArgs:
    - "--insecure"
    - '--application-namespaces="*"'
controller:
  extraArgs:
    - '--application-namespaces="*"'
notifications:
  enabled: false
dex:
  enabled: false
Run the following command to install the argocd helm chart
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update argo
helm upgrade --install argocd argo/argo-cd \
  --namespace argocd \
  --create-namespace \
  --version 7.8.26 \
  -f argocd-values.yaml
Install argo application file for argocd
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
    helm.sh/hook: pre-install
    helm.sh/hook-weight: "-20"
  labels:
    truefoundry.com/infra-component: "argocd"
spec:
  destination:
    namespace: argocd
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: argo-cd
    repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/argoproj/argo-helm
    targetRevision: 7.8.26
    helm:
      values: |-
        global:
          image:
            repository: <CUSTOM_REGISTRY>/argoproj/argocd
        redis:
          image:
            repository: <CUSTOM_REGISTRY>/library/redis
        extraObjects: 
        - apiVersion: argoproj.io/v1alpha1
          kind: AppProject
          metadata:
            name: tfy-apps
          spec:
            clusterResourceWhitelist:
            - group: '*'
              kind: '*'
            destinations:
            - namespace: '*'
              server: '*'
            sourceRepos:
            - '*'
            sourceNamespaces:
            - "*"

        notifications:
          enabled: false
        dex:
          enabled: false
        configs:
          cm:
            resource.customizations.ignoreDifferences.storage.k8s.io_CSIDriver: |
              jqPathExpressions:
              - '.spec.seLinuxMount'
        applicationSet:
          resources:
            requests:
              cpu: 100m
              memory: 50Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 200m
              memory: 100Mi
              ephemeral-storage: 512Mi
        server:
          extraArgs:
            - --insecure
            - '--application-namespaces="*"'
          resources:
            requests:
              cpu: 200m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 400m
              memory: 1024Mi
              ephemeral-storage: 512Mi
        controller:
          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
          extraArgs:
            - '--application-namespaces="*"'
          resources:
            requests:
              cpu: 1.3
              memory: 550Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 2.6
              memory: 1100Mi
              ephemeral-storage: 512Mi
        redis:
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 400m
              memory: 512Mi
              ephemeral-storage: 512Mi
        redisSecretInit:
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 400m
              memory: 512Mi
              ephemeral-storage: 512Mi
        repoServer:
          resources:
            requests:
              cpu: 0.6
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 1.2
              memory: 1024Mi
              ephemeral-storage: 512Mi
  syncPolicy:
    automated: {}
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
  1. Install argo-rollouts helm chart
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argo-rollout
  finalizers:
    - resources-finalizer.argocd.argoproj.io
  labels:
    truefoundry.com/infra-component: "argo-rollout"
spec:
  destination:
    namespace: argo-rollouts
    server: https://kubernetes.default.svc
  project: tfy-apps
  source:
    chart: argo-rollouts
    repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/argoproj/argo-helm
    targetRevision: 2.39.5
    helm:
      values: |-
        controller:
          image:
            registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 200m
              memory: 512Mi
              ephemeral-storage: 512Mi
  syncPolicy:
    automated: {}
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
  1. Install argo-workflows helm chart
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: argo-workflows
  finalizers:
    - resources-finalizer.argocd.argoproj.io
  labels:
    truefoundry.com/infra-component: "argo-workflows"
spec:
  destination:
    namespace: argo-workflows
    server: https://kubernetes.default.svc
  project: tfy-apps
  source:
    chart: argo-workflows
    repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/argoproj/argo-helm
    targetRevision: 0.45.12
    helm:
      values: |-
        controller:
          image:
            registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
          serviceMonitor:
            enabled: true
          workflowDefaults:
            spec:
              activeDeadlineSeconds: 432000
              ttlStrategy:
                secondsAfterCompletion: 3600
          metricsConfig:
            enabled: true
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 400m
              memory: 512Mi
              ephemeral-storage: 512Mi
        executor:
          image:
            registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 400m
              memory: 512Mi
              ephemeral-storage: 512Mi
        server:
          image:
            registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
          resources:
            requests:
              cpu: 100m
              memory: 256Mi
              ephemeral-storage: 256Mi
            limits:
              cpu: 400m
              memory: 512Mi
              ephemeral-storage: 512Mi
  syncPolicy:
    automated: {}
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
  1. Install TrueFoundry helm chart
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: truefoundry
  finalizers:
    - resources-finalizer.argocd.argoproj.io
  labels:
    truefoundry.com/infra-component: "truefoundry"
spec:
  destination:
    namespace: truefoundry
    server: https://kubernetes.default.svc
  project: tfy-apps
  source:
    targetRevision: 0.84.8
    repoURL: "<CUSTOM_REGISTRY>/<HELM_NS>/tfy-helm"
    chart: truefoundry
    helm:
      values: |-
        global:
          image:
            registry: <CUSTOM_REGISTRY>
          tenantName: <TENANT_NAME>
          controlPlaneURL: https://<CONTROL_PLANE_HOST>
          tfyApiKey: <TFY_API_KEY>
          database:
            host: <DATABASE_HOST>
            name: <DATABASE_NAME>
            username: <DATABASE_USERNAME>
            password: <DATABASE_PASSWORD>
          config:
            defaultCloudProvider: aws
            storageConfiguration:
              awsS3BucketName: ""
              awsRegion: ""
              awsAccessKeyId: ""
              awsSecretAccessKey: ""
              awsEndpointURL: ""
        tags:
          llmGateway: true
          llmGatewayRequestLogging: true
        tfy-clickhouse:
          enabled: false
        tfy-buildkitd-service:
          enabled: false
        tfy-configs:
          configs:
            imageMutationPolicyOverride:
              images:
                - source_registry: tfy.jfrog.io/tfy-images
                  destination_registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
                - source_registry: tfy.jfrog.io/tfy-mirror
                  destination_registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
                - source_registry: public.ecr.aws/truefoundrycloud
                  destination_registry: <CUSTOM_REGISTRY>/<REGISTRY_NS>
        extraResources:
          enabled: true
          manifests:
          - apiVersion: networking.k8s.io/v1
            kind: Ingress
            metadata:
              name: truefoundry-truefoundry-frontend-app-ingress
              annotations:
                nginx.ingress.kubernetes.io/rewrite-target: /
            spec:
              ingressClassName: nginx
              rules:
                - host: <CONTROL_PLANE_HOST>
                  http:
                    paths:
                      - path: /
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-truefoundry-frontend-app
                            port:
                              number: 5000
          - apiVersion: networking.k8s.io/v1
            kind: Ingress
            metadata:
              name: truefoundry-truefoundry-backend-app-ingress
              annotations:
                nginx.ingress.kubernetes.io/use-regex: 'true'
                nginx.ingress.kubernetes.io/rewrite-target: /$2
            spec:
              ingressClassName: nginx
              rules:
                - host: <CONTROL_PLANE_HOST>
                  http:
                    paths:
                      - path: /api/svc/socket\.io(/|$)(.*)
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-servicefoundry-server
                            port:
                              number: 3000
                      - path: /api/llm(/|$)(.*)
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-tfy-llm-gateway
                            port:
                              number: 8787
                      - path: /api/proxy-server(/|$)(.*)
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-tfy-controller
                            port:
                              number: 8123
                      - path: /api/s3proxy(/|$)(.*)
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-s3proxy
                            port:
                              number: 8080
                      - path: /api/otel(/|$)(.*)
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-tfy-otel-collector
                            port:
                              number: 4318
                      - path: /flyteidl\.service\.AdminService(/|$)(.*)
                        pathType: ImplementationSpecific
                        backend:
                          service:
                            name: truefoundry-tfy-workflow-admin-server
                            port:
                              number: 8089
  syncPolicy:
    automated: {}
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
  1. Set up the ingress for the control plane.
  2. Log in to the TrueFoundry UI and head over to the platform section from the left sidebar.
  3. Click on ‘Attach Existing Clusterand selectGeneric Cluster`.
  4. Fill the form by entering the cluster name and deselect all the components as we have already installed the required ones.
  5. Skip the helm installation steps and close the form.
  6. A cluster entry must be created in the UI and you can use the three dots from the cluster column to capture the cluster token. The token will be used the next application.
  7. Install tfy-agent application
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: tfy-agent
  finalizers:
    - resources-finalizer.argocd.argoproj.io
  labels:
    truefoundry.com/infra-component: "tfy-agent"
spec:
  destination:
    namespace: tfy-agent
    server: https://kubernetes.default.svc
  project: tfy-apps
  source:
    targetRevision: 0.2.80
    repoURL: <CUSTOM_REGISTRY>/<HELM_NS>/tfy-agent
    chart: tfy-agent
    helm:
      values: |-
        config:
          clusterToken: <CLUSTER_TOKEN>
          tenantName: <TENANT_NAME>
          controlPlaneURL: https://<CONTROL_PLANE_HOST>
        tfyAgent:
          image:
            repository: <CUSTOM_REGISTRY>/<REGISTRY_NS>/tfy-agent
          resources:
            limits:
              cpu: 500m
              memory: 1024Mi
              ephemeral-storage: 256Mi
            requests:
              cpu: 300m
              memory: 512Mi
              ephemeral-storage: 128Mi
        tfyAgentProxy:
          image:
            repository: <CUSTOM_REGISTRY>/<REGISTRY_NS>/tfy-agent-proxy
          resources:
            limits:
              cpu: 500m
              memory: 512Mi
              ephemeral-storage: 256Mi
            requests:
              cpu: 50m
              memory: 128Mi
              ephemeral-storage: 128Mi
        sdsServer:
          image:
            repository: <CUSTOM_REGISTRY>/<REGISTRY_NS>/sds-server
          resources:
            limits:
              cpu: 200m
              ephemeral-storage: 20M
              memory: 50M
            requests:
              cpu: 100m
              ephemeral-storage: 10M
              memory: 30M
  syncPolicy:
    automated: {}
    syncOptions:
      - CreateNamespace=true
      - ServerSideApply=true
  1. Ensure that your cluster is connected once the pods come up.
  2. Configure nodes to integrate with TrueFoundry
    1. CPU Nodes for Infra components - We require few CPU nodes for running compute plane infra components. No additional configuration needed for these.
    2. CPU Nodes for workloads - You can configure few CPU nodes to deploy CPU workloads using TrueFoundry. Please add following labels to these nodes:
    truefoundry.com/nodepool: cpu-pool # unique identifier for cpu nodes
    
    1. GPU Nodes for workloads - You can configure few GPU nodes to deploy GPU workloads using TrueFoundry. Please add following labels to these nodes:
    truefoundry.com/nodepool: a100-gpu-pool # unique identifier for specific gpu type nodes
    truefoundry.com/gpu_type: A10G # Possible values given below.
    
Possible Values for truefoundry.com/gpu_type:
  • A10G
  • A10_12GB
  • A10_24GB
  • A10_4GB
  • A10_8GB
  • A100_40GB
  • A100_80GB
  • H100_80GB
  • H100_94GB
  • H200
  • L4
  • L40S
  • P100
  • P4
  • T4
  • V100
  1. From the platform page, click on the sync icon on the cluster entry to sync the cluster with the control plane.