> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Deploy AI Gateway FAQs

> Frequently asked questions for deploying TrueFoundry control plane and AI Gateway, including advanced configuration.

## FAQs

<AccordionGroup>
  <Accordion title="How to add multiple gateway planes to the control plane?">
    You can add multiple gateway planes to the control plane by following the steps below:

    <Steps>
      <Step title="Create Kubernetes Secret for License Key and DB Credentials">
        We will create two secrets in this step:

        1. Store the License Key
        2. Store the Image Pull Secret

        <AccordionGroup>
          <Accordion title="Create Kubernetes Secret for License Key">
            We need to create a [Kubernetes secret](https://github.com/truefoundry/infra-charts/blob/main/charts/truefoundry/README.md#using-k8s-secret-for-required-fields) containing the licence key.

            <Info>
              Same license key will be used for all the gateway planes as used for the
              control plane
            </Info>

            ```yaml truefoundry-creds.yaml lines theme={"dark"}
            apiVersion: v1
            kind: Secret
            metadata:
              name: truefoundry-creds
            type: Opaque
            stringData:
              TFY_API_KEY: <TFY_API_KEY>
            ```

            Apply the secret to the Kubernetes cluster (Assuming you are installing the control plane in the `truefoundry` namespace)

            ```bash lines theme={"dark"}
            kubectl apply -f truefoundry-creds.yaml -n truefoundry
            ```
          </Accordion>

          <Accordion title="Create Kubernetes Secret for Image Pull Secret">
            We need to create a [Image Pull Secret](https://github.com/truefoundry/infra-charts/blob/main/charts/truefoundry/README.md#using-k8s-secret-for-required-fields) to enable pulling the truefoundry images from the private registry.

            <Info>
              Same image pull secret will be used for all the gateway planes as used for the
              control plane. Use your credentials if you are pulling TrueFoundry images from
              your registry.
            </Info>

            ```yaml truefoundry-image-pull-secret.yaml lines theme={"dark"}
            apiVersion: v1
            kind: Secret
            metadata:
              name: truefoundry-image-pull-secret
            type: kubernetes.io/dockerconfigjson
            data:
              .dockerconfigjson: <IMAGE_PULL_SECRET> # Provided by TrueFoundry team
            ```

            Apply the secret to the Kubernetes cluster (Assuming you are installing the control plane in the `truefoundry` namespace)

            ```bash lines theme={"dark"}
            kubectl apply -f truefoundry-image-pull-secret.yaml -n truefoundry
            ```
          </Accordion>
        </AccordionGroup>
      </Step>

      <Step title="Create Helm chart Values file for gateway plane">
        Create a values file as given below and replace the following values:

        * `CONTROL_PLANE_URL`: URL that you will map to the control plane dashboard.
        * `TENANT_NAME`: Tenant name provided by TrueFoundry team.
        * `GATEWAY_ENDPOINT_HOST`: The domain where you will expose the gateway endpoint (e.g., `gateway.example.com`)

        ```yaml truefoundry-gateway-values.yaml wrap expandable lines theme={"dark"}
        global:
          # This is the reference to the secrets we created in the previous step
          imagePullSecrets:
            - name: "truefoundry-image-pull-secret"

          # Choose the resource tier as per your needs
          resourceTier: medium # or small or large
          controlPlaneURL: <CONTROL_PLANE_URL> # eg. https://example-company.truefoundry.cloud
          tenantName: <TENANT_NAME>

        ingress:
          enabled: true
          annotations: {}
          ingressClassName: nginx
          tls: []
          hosts:
            - <GATEWAY_ENDPOINT_HOST>

        # Optional: Istio configuration (if using Istio instead of standard ingress)
        # istio:
        #   virtualservice:
        #     hosts:
        #       - <GATEWAY_ENDPOINT_HOST>
        #     enabled: true
        #     retries:
        #       enabled: true
        #       retryOn: gateway-error
        #     gateways:
        #       - istio-system/tfy-wildcard
        #     annotations: {}
        ```
      </Step>

      <Step title="Install Helm chart for gateway plane">
        ```bash wrap lines theme={"dark"}
        helm upgrade --install tfy-llm-gateway oci://tfy.jfrog.io/tfy-helm/tfy-llm-gateway -n truefoundry --create-namespace -f truefoundry-gateway-values.yaml
        ```
      </Step>
    </Steps>
  </Accordion>

  <Accordion title="Can I use my Artifactory as a mirror to pull images?">
    Yes. You can configure your Artifactory to mirror our registry.

    <Note>
      Credentials for accessing the TrueFoundry private registry are required and
      will be provided during onboarding.
    </Note>

    **1. Registry Configuration**

    * **URL**: `https://tfy.jfrog.io/`

    **2. Update Helm values**

    ```yaml wrap lines theme={"dark"}
    global:
      image:
        registry: <YOUR_REGISTRY> # Replace with your registry
    postgresql:
      image:
        registry: <YOUR_REGISTRY> # Replace with your registry, use this if `devMode` is enabled
    ```
  </Accordion>

  <Accordion title="Can I copy images to my own private registry?">
    Yes. We provide a [script](https://github.com/truefoundry/infra-charts/blob/main/scripts/clone_images_to_your_registry.sh) that uses the `truefoundry` Helm Chart to identify and copy required images to your private registry.

    <Note>
      Credentials for accessing the TrueFoundry private registry are required and
      will be provided during onboarding.
    </Note>

    <Tabs>
      <Tab title="Generic Registry">
        **1. Install required dependencies**

        * [Skopeo](https://github.com/containers/skopeo/blob/main/install.md)
          * Used to perform the image copy operation.
        * [Helm](https://helm.sh/docs/intro/install/)
          * Used to get the list of images from the TrueFoundry Helm Chart.

        **2. Add TrueFoundry Helm Chart repository**

        ```bash wrap lines theme={"dark"}
        helm repo add truefoundry https://truefoundry.github.io/infra-charts
        helm repo update
        ```

        **3. Authenticate to the TrueFoundry source registry**

        ```bash wrap lines theme={"dark"}
        skopeo login -u <USERNAME> -p <PASSWORD> https://tfy.jfrog.io/
        ```

        <Note>
          Replace `<USERNAME>` with the TrueFoundry registry username.\
          Replace `<PASSWORD>` with the TrueFoundry registry password.
        </Note>

        **4. Authenticate to your destination registry**

        ```bash wrap lines theme={"dark"}
        skopeo login -u <USERNAME> -p <PASSWORD> <YOUR_REGISTRY>
        ```

        <Note>
          Replace `<USERNAME>` with your registry username.\
          Replace `<PASSWORD>` with your registry password.\
          Replace `<YOUR_REGISTRY>` with the URL of your registry.

          Skopeo will use authentication details for a registry that was previously authenticated with `docker login`.

          Alternatively, you can use the `--dest-user` and `--dest-password` flags to provide the username and password for the destination registry.
        </Note>

        **5. Run Clone Image Script**

        ```bash wrap lines theme={"dark"}
        export TRUEFOUNDRY_HELM_CHART_VERSION=<TRUEFOUNDRY_HELM_CHART_VERSION>
        export TRUEFOUNDRY_HELM_VALUES_FILE=<TRUEFOUNDRY_HELM_VALUES_FILE>
        export DEST_REGISTRY=<YOUR_DESTINATION_REGISTRY>

        # Dry-run example
        curl -s https://raw.githubusercontent.com/truefoundry/infra-charts/main/scripts/clone_images_to_your_registry.sh | bash -s -- --helm-chart truefoundry --helm-version $TRUEFOUNDRY_HELM_CHART_VERSION --helm-values $TRUEFOUNDRY_HELM_VALUES_FILE --dest-registry $DEST_REGISTRY --dry-run

        # Live example
        curl -s https://raw.githubusercontent.com/truefoundry/infra-charts/main/scripts/clone_images_to_your_registry.sh | bash -s -- --helm-chart truefoundry --helm-version $TRUEFOUNDRY_HELM_CHART_VERSION --helm-values $TRUEFOUNDRY_HELM_VALUES_FILE --dest-registry $DEST_REGISTRY
        ```

        <Note>
          Replace `<TRUEFOUNDRY_HELM_CHART_VERSION>` with the version of the Truefoundry
          helm chart you want to use. You can find the latest version in the
          [changelog](https://docs.truefoundry.com/changelog).

          Replace `<TRUEFOUNDRY_HELM_VALUES_FILE>` with the path to the values file you created in the [Installation Instructions](#installation-instructions).

          Replace `<DEST_REGISTRY>` with the URL of your registry.
        </Note>

        **6. Update the Helm values file to use your registry**

        ```yaml wrap lines theme={"dark"}
        global:
          image:
            registry: <YOUR_REGISTRY> # Replace with your registry
        postgresql:
          image:
            registry: <YOUR_REGISTRY> # Replace with your registry, use this if `devMode` is enabled
        ```
      </Tab>

      <Tab title="AWS ECR Registry">
        **1. Install required dependencies**

        * [Skopeo](https://github.com/containers/skopeo/blob/main/install.md)
          * Used to perform the image copy operation.
        * [Helm](https://helm.sh/docs/intro/install/)
          * Used to get the list of images from the TrueFoundry Helm Chart.
        * [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)
          * Used to perform AWS ECR actions to validate and create repositories.

        **2. Add TrueFoundry Helm Chart repository**

        ```bash wrap lines theme={"dark"}
        helm repo add truefoundry https://truefoundry.github.io/infra-charts
        helm repo update
        ```

        **3. Authenticate to the TrueFoundry source registry**

        ```bash wrap lines theme={"dark"}
        skopeo login -u <USERNAME> -p <PASSWORD> https://tfy.jfrog.io/
        ```

        <Note>
          Replace `<USERNAME>` with the TrueFoundry registry username.\
          Replace `<PASSWORD>` with the TrueFoundry registry password.
        </Note>

        **4. Authenticate to your destination registry**

        ```bash wrap lines theme={"dark"}
        # Set your AWS profile
        export AWS_PROFILE=<AWS_PROFILE>

        # Authenticate to ECR using the profile
        aws ecr get-login-password --region us-west-2 | skopeo login --username AWS --password-stdin <YOUR_ECR_REGISTRY>
        ```

        <Note>
          Replace `<AWS_PROFILE>` with your AWS profile name.\
          Replace `<YOUR_ECR_REGISTRY>` with the URL of your ECR registry (ex. `123456789012.dkr.ecr.us-east-2.amazonaws.com`).

          Skopeo will use authentication details for a registry that was previously authenticated with `docker login`.
        </Note>

        **5. Run Clone Image Script**

        * This script creates required ECR repositories and copies images.
        * Optionally append a path to your registry URL to namespace repositories (e.g., `123456789012.dkr.ecr.us-east-2.amazonaws.com/truefoundry`).

        ```bash wrap lines theme={"dark"}
        export TRUEFOUNDRY_HELM_CHART_VERSION=<TRUEFOUNDRY_HELM_CHART_VERSION>
        export TRUEFOUNDRY_HELM_VALUES_FILE=<TRUEFOUNDRY_HELM_VALUES_FILE>
        export DEST_REGISTRY=<YOUR_DESTINATION_REGISTRY>

        # Dry-run example
        curl -s https://raw.githubusercontent.com/truefoundry/infra-charts/main/scripts/clone_images_to_your_registry.sh | bash -s -- --helm-chart truefoundry --helm-version $TRUEFOUNDRY_HELM_CHART_VERSION --helm-values $TRUEFOUNDRY_HELM_VALUES_FILE --dest-registry $DEST_REGISTRY --dry-run

        # Live example
        curl -s https://raw.githubusercontent.com/truefoundry/infra-charts/main/scripts/clone_images_to_your_registry.sh | bash -s -- --helm-chart truefoundry --helm-version $TRUEFOUNDRY_HELM_CHART_VERSION --helm-values $TRUEFOUNDRY_HELM_VALUES_FILE --dest-registry $DEST_REGISTRY
        ```

        <Note>
          Replace `<TRUEFOUNDRY_HELM_CHART_VERSION>` with the TrueFoundry Helm chart version. Find the latest version in the [changelog](https://docs.truefoundry.com/changelog).

          Replace `<TRUEFOUNDRY_HELM_VALUES_FILE>` with the path to your values file from [Installation Instructions](#installation-instructions).

          Replace `<YOUR_DESTINATION_ECR_REGISTRY>` with your ECR registry URL (e.g., `123456789012.dkr.ecr.us-east-2.amazonaws.com/truefoundy`).
        </Note>

        **6. Update the Helm values file to use your registry**

        ```yaml wrap lines theme={"dark"}
        global:
          image:
            registry: <YOUR_REGISTRY> # Replace with your registry
        postgresql:
          image:
            registry: <YOUR_REGISTRY> # Replace with your registry, use this if `devMode` is enabled
        ```
      </Tab>
    </Tabs>
  </Accordion>

  <Accordion title="How to install in an air-gapped / restricted network environment?">
    An air-gapped environment is isolated from the internet. Since the control plane and gateway plane ship as a single helm chart (`truefoundry`), you only need to make the container images available in your private registry and update the helm values to point to it.

    1. **Copy images** to your private registry — set up a [registry mirror](#can-i-use-my-artifactory-as-a-mirror-to-pull-images) or [copy images directly](#can-i-copy-images-to-my-own-private-registry) using the steps described in the FAQs above
    2. **Update helm values** to point to your private registry (see the helm value overrides in the same FAQs above)
    3. **Continue with the standard installation** on the [overview](/docs/platform/deploy-control-plane-and-gateway-plane) and choose your cloud install guide (AWS, GCP, Azure, or on-prem)
  </Accordion>

  <Accordion title="How to integrate with AWS bedrock models from a different AWS account?">
    You can integrate with AWS bedrock models from a different AWS account by following the steps below:

    1. Add the following IAM policy to the control plane IAM role so that it can assume the IAM role of the AWS account that has the bedrock models:

    ```json lines theme={"dark"}
    {
      "Statement": [
        {
          "Action": "sts:AssumeRole",
          "Effect": "Allow",
          "Resource": "*"
        }
      ],
      "Version": "2012-10-17"
    }
    ```

    2. In the IAM role in the destination AWS account (which has bedrock access), add the following trust policy to allow the control plane IAM role to assume it:

    ```json lines theme={"dark"}
    {
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "<CONTROL_PLANE_IAM_ROLE_ARN>"
          },

          "Action": "sts:AssumeRole"
        }
      ],
      "Version": "2012-10-17"
    }
    ```

    3. Now you can use the IAM role of the destination AWS account while [integrating AWS bedrock models](/docs/ai-gateway/aws-bedrock) in the TrueFoundry AI gateway.
  </Accordion>

  <Accordion title="Do we need any NFS volumes in Kubernetes for the AI Gateway or Control Plane?">
    No, we only need block storage for installing and running Truefoundry. This should be supported via the CSI driver and only ReadWriteOnce access is required.
  </Accordion>

  <Accordion title="What is the structure of access logs">
    We log access information in standard output with the following format:

    1. logfmt
    2. json

    These can be switched with the help of an environment variable to the AI Gateway installation. (Default: logfmt)

    ### Log format

    Standard log format structure:

    ```wrap lines theme={"dark"}
    time="%START_TIME%" level=%LEVEL% ip=%IP_ADDRESS% tenant=%TENANT_NAME% user=%SUBJECT_TYPE%:%SUBJECT_SLUG% model=%MODEL_ID% method=%METHOD% path=%PATH% status=%STATUS_CODE% time_taken=%DURATION%ms trace_id=%TRACE_ID%
    ```

    | Log operator  | Details                                                                                                             |
    | ------------- | ------------------------------------------------------------------------------------------------------------------- |
    | START\_TIME   | ISO timestamp for request start. eg. 2025-08-12 13:34:50                                                            |
    | LEVEL         | info\|warn\|error                                                                                                   |
    | IP\_ADDRESS   | IP address of the caller. eg. ::ffff:10.99.55.142                                                                   |
    | TENANT\_NAME  | Name of the tenant. eg. truefoundry                                                                                 |
    | SUBJECT\_TYPE | user\|virtualaccount                                                                                                |
    | SUBJECT\_SLUG | Email or virtual account name. eg. [tfy-user@truefoundry.com](mailto:tfy-user@truefoundry.com)\|demo-virtualaccount |
    | MODEL\_ID     | Model ID. eg. openai-default/gpt-5                                                                                  |
    | METHOD        | GET\|POST\|PUT                                                                                                      |
    | PATH          | Path of the request. eg. /api/inference/openai/chat/completions                                                     |
    | STATUS\_CODE  | 200\|400\|401\|403\|429\|500                                                                                        |
    | DURATION      | Duration of the request. eg. 12                                                                                     |
    | TRACE\_ID     | Trace ID of the request                                                                                             |

    **Examples:**

    ```wrap lines theme={"dark"}
    time="2025-08-12 13:34:50" level=info ip=::ffff:10.99.55.142 tenant=truefoundry user=virtualaccount:demo-virtualaccount model=openai-default/gpt-5 method=POST path=/api/inference/openai/chat/completions status=200 time_taken=53ms trace_id=587b2a946c13f62f9160674a8c983ce3
    ```
  </Accordion>

  <Accordion title="How to use SSO directly without using TrueFoundry Auth Server?">
    By default, the control plane uses the TrueFoundry Auth Server for user authentication. However, you can configure it to use your own external identity provider instead. We support both OIDC and SAML-compliant identity providers. [Read more](https://www.truefoundry.com/docs/deploy-control-plane-with-external-oauth)
  </Accordion>

  <Accordion title="Requests to the gateway are timing out after a certain duration">
    If your LLM requests are timing out after a certain duration, the first thing to check is the **traces** in the TrueFoundry dashboard. Look at the request duration — if you see requests consistently timing out at exactly 60 seconds, the issue is almost certainly the **load balancer**, not the TrueFoundry AI Gateway. **The TrueFoundry gateway does not impose any request timeout.**

    <img src="https://mintcdn.com/truefoundry/yR_clVDeJDlQkXKY/images/docs/ai-gateway/alb-timeout-traces.png?fit=max&auto=format&n=yR_clVDeJDlQkXKY&q=85&s=56c0579b9e5bf6f68663f242242499ff" alt="Traces showing requests timing out at 60 seconds" width="3470" height="1090" data-path="images/docs/ai-gateway/alb-timeout-traces.png" />

    This commonly happens when an **Application Load Balancer (ALB)** is placed in front of the gateway to expose it. The default **Connection idle timeout** on AWS ALBs is 60 seconds, which is too short for long-running LLM inference requests (especially streaming responses or large prompts).

    **Solution:** Increase the idle timeout on your AWS ALB to a higher value (e.g., 300 seconds or more).

    You can find this setting in the AWS Console under **EC2 → Load Balancers → Select your ALB → Attributes tab → Connection idle timeout**.

    <img src="https://mintcdn.com/truefoundry/yR_clVDeJDlQkXKY/images/docs/ai-gateway/aws-alb-idle-timeout-setting.png?fit=max&auto=format&n=yR_clVDeJDlQkXKY&q=85&s=c1b0431e90ea8b9cb765a7f9cb21e18d" alt="AWS ALB Connection idle timeout setting" width="1024" height="447" data-path="images/docs/ai-gateway/aws-alb-idle-timeout-setting.png" />

    You can also update it via the AWS CLI:

    ```bash theme={"dark"}
    aws elbv2 modify-load-balancer-attributes \
      --load-balancer-arn <YOUR_ALB_ARN> \
      --attributes Key=idle_timeout.timeout_seconds,Value=300
    ```

    <Info>
      If you are using an ingress controller (e.g., NGINX Ingress) in addition to the ALB, also verify that the ingress controller's proxy timeout settings are configured appropriately.
    </Info>
  </Accordion>

  <Accordion title="Can I get TrueFoundry metrics in Victoria Metrics instead of Prometheus?">
    Yes. TrueFoundry supports exporting metrics to [Victoria Metrics](https://victoriametrics.com/) as an alternative to Prometheus. To enable this, add the following to your `truefoundry-values.yaml` file and upgrade the Helm release:

    <Note>
      This only installs the `VMServiceScrape` and related custom resources for
      scraping TrueFoundry metrics. It does **not** deploy Victoria Metrics itself —
      you are responsible for installing and managing your own Victoria Metrics
      instance.
    </Note>

    ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
    victoriaMetricsMonitoring:
      enabled: true
    ```

    Then upgrade the Helm release to apply the changes:

    ```bash wrap lines theme={"dark"}
    helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry -n truefoundry --create-namespace -f truefoundry-values.yaml
    ```
  </Accordion>

  <Accordion title="How to enable SSL for PostgreSQL connections?">
    The TrueFoundry control plane supports SSL connections to PostgreSQL. You can configure SSL by setting the `DB_SSL_MODE` environment variable in your `truefoundry-values.yaml`.

    Supported `DB_SSL_MODE` values:

    | Mode          | Encryption | Certificate Validation | Use Case                                                        |
    | ------------- | ---------- | ---------------------- | --------------------------------------------------------------- |
    | `disable`     | No         | No                     | Local development or trusted networks                           |
    | `no-verify`   | Yes        | No                     | Managed databases with self-signed or unverified certs          |
    | `require`     | Yes        | Yes (system CA store)  | When you have a valid CA certificate and want full verification |
    | `verify-ca`   | Yes        | Yes (custom CA)        | Same as `require` but explicitly checks CA                      |
    | `verify-full` | Yes        | Yes (CA + hostname)    | Strictest mode, validates CA and hostname                       |

    SSL certificate environment variables:

    | Variable           | Purpose                                        | Required                                                     |
    | ------------------ | ---------------------------------------------- | ------------------------------------------------------------ |
    | `DB_SSL_CA_PATH`   | Path to the server CA certificate file         | For `require`, `verify-ca`, or `verify-full` modes           |
    | `DB_SSL_CERT_PATH` | Path to the client certificate file (for mTLS) | Only for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL) |
    | `DB_SSL_KEY_PATH`  | Path to the client private key file (for mTLS) | Only for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL) |

    <Note>
      The certificate requirements vary by cloud provider. AWS RDS only needs the server CA bundle (`DB_SSL_CA_PATH`), while GCP Cloud SQL and Azure Database for PostgreSQL may require all three certificate paths when client certificate authentication (mTLS) is enabled. Refer to the cloud-specific control plane documentation for detailed examples.
    </Note>

    **Scenario 1: Encrypted connection without certificate validation (`no-verify`)**

    This is the simplest option for managed databases. It encrypts the connection but skips server certificate validation.

    ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
    servicefoundryServer:
      env:
        DB_SSL_MODE: "no-verify"
    mlfoundryServer:
      env:
        DB_SSL_MODE: "no-verify"
    ```

    **Scenario 2: Encrypted connection with certificate validation (`require`)**

    This mode encrypts the connection and validates the server certificate. You must provide the appropriate certificate files for your database provider. The example below shows the full configuration with all three certificate paths (for GCP/Azure mTLS). For AWS RDS, only `DB_SSL_CA_PATH` is needed.

    Create a Kubernetes Secret containing your certificate files:

    ```bash wrap lines theme={"dark"}
    # AWS RDS (CA bundle only)
    kubectl create secret generic db-ssl-certs \
      --from-file=ca-certificate.crt=/path/to/your/ca-certificate.crt \
      -n truefoundry

    # GCP Cloud SQL / Azure (full mTLS)
    kubectl create secret generic db-ssl-certs \
      --from-file=ca-certificate.crt=/path/to/server-ca.pem \
      --from-file=client-cert.pem=/path/to/client-cert.pem \
      --from-file=client-key.pem=/path/to/client-key.pem \
      -n truefoundry
    ```

    Then configure `truefoundry-values.yaml` to mount the certificates and set the SSL paths:

    ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
    servicefoundryServer:
      env:
        DB_SSL_MODE: "require"
        DB_SSL_CA_PATH: "/etc/ssl/custom/ca-certificate.crt"
        # Only needed for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL)
        DB_SSL_CERT_PATH: "/etc/ssl/custom/client-cert.pem"
        DB_SSL_KEY_PATH: "/etc/ssl/custom/client-key.pem"
      extraVolumes:
        - name: db-ssl-certs
          secret:
            secretName: db-ssl-certs
      extraVolumeMounts:
        - name: db-ssl-certs
          mountPath: /etc/ssl/custom
          readOnly: true
    mlfoundryServer:
      env:
        DB_SSL_MODE: "require"
        DB_SSL_CA_PATH: "/etc/ssl/custom/ca-certificate.crt"
        # Only needed for mTLS (GCP Cloud SQL, Azure Database for PostgreSQL)
        DB_SSL_CERT_PATH: "/etc/ssl/custom/client-cert.pem"
        DB_SSL_KEY_PATH: "/etc/ssl/custom/client-key.pem"
      extraVolumes:
        - name: db-ssl-certs
          secret:
            secretName: db-ssl-certs
      extraVolumeMounts:
        - name: db-ssl-certs
          mountPath: /etc/ssl/custom
          readOnly: true
    ```

    Upgrade the Helm release to apply the changes:

    ```bash wrap lines theme={"dark"}
    helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry -n truefoundry --create-namespace -f truefoundry-values.yaml
    ```
  </Accordion>

  <Accordion title="How to configure custom CA certificates?">
    If your TrueFoundry deployment needs to trust custom Certificate Authorities (e.g., for internal services, private registries, or corporate proxies), you can configure custom CA certificates in the Helm chart.

    There are two methods to provide custom CA certificates:

    ### Method 1: Pass customCA as a multiline string

    You can directly provide the CA certificate content as a multiline string in your `values.yaml`:

    ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
    global:
      customCA:
        enabled: true
        certificate: |
          -----BEGIN CERTIFICATE-----
          MIIDXTCCAkWgAwIBAgIJAKZ7VqHEqvmKMA0GCSqGSIb3DQEBCwUAMEUxCzAJBgNV
          BAYTAkFVMRMwEQYDVQQIDApTb21lLVN0YXRlMSEwHwYDVQQKDBhJbnRlcm5ldCBX
          ... (rest of your certificate) ...
          -----END CERTIFICATE-----
    ```

    This method is suitable when you have one or a few CA certificates to add.

    ### Method 2: Use an existing ConfigMap containing CA certificate(s)

    If you already have your custom CA certificates in a Kubernetes ConfigMap, you can reference it directly. An initContainer will merge the custom CA with the system CAs.

    <Steps>
      <Step title="Create a ConfigMap with your custom CA certificate(s)">
        Create a Kubernetes ConfigMap containing your custom CA certificate(s):

        ```bash wrap lines theme={"dark"}
        kubectl create configmap custom-ca-certificates \
          --from-file=ca-certificates.crt=custom-ca.crt \
          -n truefoundry
        ```

        Alternatively, if you want to create it from a YAML file:

        ```yaml custom-ca-configmap.yaml wrap lines theme={"dark"}
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: custom-ca-certificates
          namespace: truefoundry
        data:
          ca-certificates.crt: |
            -----BEGIN CERTIFICATE-----
            ... (your custom CA certificate content) ...
            -----END CERTIFICATE-----
        ```

        Apply the ConfigMap:

        ```bash wrap lines theme={"dark"}
        kubectl apply -f custom-ca-configmap.yaml
        ```
      </Step>

      <Step title="Reference the ConfigMap in your Helm values">
        Update your `truefoundry-values.yaml` to reference the ConfigMap:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          customCA:
            enabled: true
            existingConfigMap:
              name: custom-ca-certificates
        ```
      </Step>

      <Step title="Upgrade the Helm installation">
        Apply the changes by upgrading your Helm release:

        ```bash wrap lines theme={"dark"}
        helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
          -n truefoundry --create-namespace -f truefoundry-values.yaml
        ```
      </Step>
    </Steps>

    ### Method 2b: Use an existing ConfigMap with `overrideCAList`

    If you want the ConfigMap to **replace** the system CA bundle entirely instead of merging, set `overrideCAList` to `true`. In this mode, the ConfigMap is mounted directly at `/etc/ssl/certs/` (no initContainer is used), so the ConfigMap must contain the **full** CA bundle (system + custom CAs).

    <Steps>
      <Step title="Prepare your CA certificate file">
        Add your custom CA certificate(s) to your system's CA bundle. On a Linux system with the certificate file saved as `custom-ca.crt`:

        ```bash wrap lines theme={"dark"}
        # Copy the certificate to the CA directory
        sudo cp custom-ca.crt /usr/local/share/ca-certificates/

        # Update the CA certificates bundle
        sudo update-ca-certificates
        ```

        This will generate or update `/etc/ssl/certs/ca-certificates.crt` with your custom CA included (system CAs + your custom CA).
      </Step>

      <Step title="Create a ConfigMap from the complete ca-certificates.crt file">
        Create a Kubernetes ConfigMap containing the complete CA bundle:

        ```bash wrap lines theme={"dark"}
        kubectl create configmap custom-ca-certificates \
          --from-file=ca-certificates.crt=/etc/ssl/certs/ca-certificates.crt \
          -n truefoundry
        ```

        Alternatively, if you want to create it from a YAML file:

        ```yaml custom-ca-configmap.yaml wrap lines theme={"dark"}
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: custom-ca-certificates
          namespace: truefoundry
        data:
          ca-certificates.crt: |
            -----BEGIN CERTIFICATE-----
            ... (your complete ca-certificates.crt content including system + custom CAs) ...
            -----END CERTIFICATE-----
        ```

        Apply the ConfigMap:

        ```bash wrap lines theme={"dark"}
        kubectl apply -f custom-ca-configmap.yaml
        ```
      </Step>

      <Step title="Reference the ConfigMap in your Helm values with overrideCAList">
        Update your `truefoundry-values.yaml` to reference the ConfigMap with `overrideCAList` enabled:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          customCA:
            enabled: true
            existingConfigMap:
              name: custom-ca-certificates
              overrideCAList: true
        ```
      </Step>

      <Step title="Upgrade the Helm installation">
        Apply the changes by upgrading your Helm release:

        ```bash wrap lines theme={"dark"}
        helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
          -n truefoundry --create-namespace -f truefoundry-values.yaml
        ```
      </Step>
    </Steps>

    <Warning>
      When `overrideCAList` is set to `true`, the ConfigMap is mounted directly replacing the system CA bundle. Your ConfigMap must contain the complete CA bundle (system CAs + your custom CAs). If you only include your custom CAs, all standard public CA trust will be lost and outbound HTTPS connections to public services will fail.
    </Warning>

    <Info>
      The custom CA certificates will be mounted into all TrueFoundry pods and added to the system's trust store. This ensures that all outgoing HTTPS connections from TrueFoundry services will trust your custom CAs.
    </Info>

    <Note>
      After adding custom CA certificates, verify that your TrueFoundry pods have restarted and are running correctly. You may need to restart existing pods for the changes to take effect.
    </Note>
  </Accordion>

  <Accordion title="How to enable in-pod TLS termination on the proxy (control plane and gateway)?">
    By default, TLS is terminated at your ingress controller or load balancer, and traffic reaches the TrueFoundry proxy (Caddy) over plain HTTP inside the cluster.

    **In-pod TLS termination** moves that step into the proxy container: Caddy terminates HTTPS using a certificate you provide, then forwards to the application over loopback HTTP. This is useful when you want the same certificate inside the pod.

    | Plane         | Helm chart        | Values path        | Caddy listener                                              |
    | ------------- | ----------------- | ------------------ | ----------------------------------------------------------- |
    | Control plane | `truefoundry`     | `global.proxy.tls` | `:8080` on `tfy-proxy`                                      |
    | Gateway       | `tfy-llm-gateway` | `proxy.tls`        | `:8081` on the gateway proxy sidecar (app stays on `:8787`) |

    <Warning>
      Do **not** terminate TLS at the ingress **and** inside the pod for the same hostname. Pick one layer:

      * **In-pod termination (this guide):** ingress must **pass through** encrypted traffic (for example, NGINX `ssl-passthrough`). Do not attach a TLS certificate on the Ingress resource for that host.
      * **Ingress termination (default):** leave `global.proxy.tls.enabled` / `proxy.tls.enabled` as `false` and configure TLS on the Ingress or Gateway API parent instead.
    </Warning>

    ### Traffic flow (in-pod termination)

    ```text theme={"dark"}
    Client ──HTTPS──► Ingress (TLS passthrough) ──HTTPS──► Caddy in pod ──HTTP──► App (servicefoundry / llm-gateway)
    ```

    ### Prerequisites

    1. A Kubernetes TLS `Secret` in the release namespace with PEM certificate and private key (standard keys `tls.crt` and `tls.key`).
    2. An ingress controller that can forward TLS without terminating it when using Ingress (see below).
    3. For **self-signed or private CAs**: also configure [custom CA certificates](#how-to-configure-custom-ca-certificates) so Node.js services trust outbound HTTPS, **or** use in-cluster HTTP URLs for internal API calls (recommended).

    ***

    ### Control plane (`truefoundry` chart)

    <Steps>
      <Step title="Create the TLS Secret">
        Create a `kubernetes.io/tls` secret in the `truefoundry` namespace. Replace the paths with your certificate and key files:

        ```bash wrap lines theme={"dark"}
        kubectl create secret tls tfy-proxy-cp-tls \
          --cert=/path/to/tls.crt \
          --key=/path/to/tls.key \
          -n truefoundry
        ```

        For a wildcard host such as `*.primary.example.com`, issue a cert that covers your control-plane hostname (for example `cp.primary.example.com`).
      </Step>

      <Step title="Enable proxy TLS in Helm values">
        Add the following to `truefoundry-values.yaml`:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          proxy:
            tls:
              enabled: true
              secretName: tfy-proxy-cp-tls
              # Optional: if your Secret uses non-standard keys
              # secretKeys:
              #   cert: tls.crt
              #   key: tls.key
        ```

        Upgrade the release:

        ```bash wrap lines theme={"dark"}
        helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
          -n truefoundry --create-namespace -f truefoundry-values.yaml
        ```
      </Step>

      <Step title="Configure ingress for TLS passthrough">
        When `global.proxy.tls.enabled` is `true`, Caddy expects HTTPS on the service port. Your ingress must forward the TLS connection without terminating it.

        **ingress-nginx** — enable passthrough on the controller (once per cluster) and annotate the control-plane Ingress:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          ingress:
            enabled: true
            ingressClassName: nginx
            hosts:
              - cp.example.com
            # Do not set global.ingress.tls when using in-pod termination — TLS is handled inside tfy-proxy.
            annotations:
              nginx.ingress.kubernetes.io/ssl-passthrough: "true"
          proxy:
            tls:
              enabled: true
              secretName: tfy-proxy-cp-tls
        ```

        The ingress-nginx controller must be installed with `controller.extraArgs.enable-ssl-passthrough: "true"`.

        **Istio / Gateway API** — configure TLS mode `PASSTHROUGH` on the Gateway listener that fronts the control plane. TLS is not configured on the `HTTPRoute` itself.
      </Step>

      <Step title="Verify the control plane">
        ```bash wrap lines theme={"dark"}
        kubectl -n truefoundry rollout status deploy/truefoundry-tfy-proxy
        curl -vk https://cp.example.com/health
        ```

        You should get a successful response over HTTPS. Check that the certificate presented to the client is the one from your Secret (not only the ingress default certificate).
      </Step>

      <Step title="Update gateway `CONTROL_PLANE_URL` when `tags.llmGateway` is enabled and control-plane proxy TLS is on">
        When `global.proxy.tls.enabled` is `true`, `truefoundry-tfy-proxy` listens with **TLS on port 8080**. In-cluster HTTP calls such as `http://<release>-tfy-proxy:8080` will fail (for example `ECONNRESET` or certificate errors).

        If you deploy the gateway with the `truefoundry` chart (`tags.llmGateway: true`), override `tfy-llm-gateway.env.CONTROL_PLANE_URL` to your **HTTPS control-plane URL** (`global.controlPlaneURL`), not the internal `http://...-tfy-proxy:8080` address:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          controlPlaneURL: https://cp.example.com
          proxy:
            tls:
              enabled: true
              secretName: tfy-proxy-cp-tls
          customCA:
            enabled: true
            existingConfigMap:
              name: custom-ca-certificates

        tfy-llm-gateway:
          env:
            CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
            PUBLIC_CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
        ```

        <Note>
          The standalone `tfy-llm-gateway` chart already sets `CONTROL_PLANE_URL` from `global.controlPlaneURL` by default. The override above is required when `tags.llmGateway` is `true` on the `truefoundry` chart, because the parent chart default uses `http://{{ .Release.Name }}-tfy-proxy:8080`.
        </Note>
      </Step>
    </Steps>

    ***

    ### Gateway plane (`tfy-llm-gateway` chart)

    Use this when deploying the gateway as its own Helm release (gateway plane only) or when overriding the `tfy-llm-gateway` subchart under the parent `truefoundry` chart.

    <Steps>
      <Step title="Create the TLS Secret">
        ```bash wrap lines theme={"dark"}
        kubectl create secret tls tfy-proxy-gateway-tls \
          --cert=/path/to/tls.crt \
          --key=/path/to/tls.key \
          -n truefoundry
        ```
      </Step>

      <Step title="Enable proxy TLS and ingress passthrough">
        **Standalone gateway release** (`truefoundry-values.yaml` for `tfy-llm-gateway`):

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          # Must be https:// when the control-plane tfy-proxy has global.proxy.tls.enabled
          controlPlaneURL: https://cp.example.com

        proxy:
          tls:
            enabled: true
            secretName: tfy-proxy-gateway-tls

        # env.CONTROL_PLANE_URL defaults to global.controlPlaneURL in this chart.
        # Override explicitly if a parent release set it to http://...-tfy-proxy:8080:
        # env:
        #   CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"

        ingress:
          enabled: true
          ingressClassName: nginx
          hosts:
            - gateway.example.com
          annotations:
            nginx.ingress.kubernetes.io/ssl-passthrough: "true"
          # Do not set ingress.tls — TLS terminates inside the pod.
        ```

        **Gateway bundled with `truefoundry`** (`tags.llmGateway: true`) — nest under `tfy-llm-gateway:`:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        tfy-llm-gateway:
          proxy:
            tls:
              enabled: true
              secretName: tfy-proxy-gateway-tls
          ingress:
            enabled: true
            annotations:
              nginx.ingress.kubernetes.io/ssl-passthrough: "true"
        ```
      </Step>

      <Step title="Configure environment variables for startup">
        The gateway loads configuration at startup over HTTP(S). Set `env` based on whether the **control-plane** proxy has in-pod TLS enabled.

        **When `global.proxy.tls.enabled` is `true` on the control plane** (same cluster), set `CONTROL_PLANE_URL` to the public control-plane URL. Do **not** use `http://<release>-tfy-proxy:8080` — that port expects HTTPS:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        global:
          controlPlaneURL: https://cp.example.com

        tfy-llm-gateway:
          env:
            CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
            PUBLIC_CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"
        ```

        Add [custom CA certificates](#how-to-configure-custom-ca-certificates) if `controlPlaneURL` uses a private or mkcert-signed certificate.

        **When control-plane proxy TLS is disabled** (default), you can use the internal proxy URL for `CONTROL_PLANE_URL` if the gateway and control plane share a release:

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        tfy-llm-gateway:
          env:
            CONTROL_PLANE_URL: http://truefoundry-tfy-proxy:8080
            PUBLIC_CONTROL_PLANE_URL: https://cp.example.com
            SERVICEFOUNDRY_SERVER_URL: http://truefoundry-servicefoundry-server:3000
            CONTROL_PLANE_NATS_URL: http://truefoundry-tfy-nats:4222
        ```

        Replace `truefoundry` with your Helm release name if different. `SERVICEFOUNDRY_SERVER_URL` is used to fetch NATS credentials (`/v1/x/llm-gateway/nats-creds`); pointing it at `servicefoundry-server` avoids TLS issues on the proxy port.
      </Step>

      <Step title="Verify the gateway">
        ```bash wrap lines theme={"dark"}
        kubectl -n truefoundry rollout status deploy/tfy-llm-gateway
        kubectl -n truefoundry get pods -l app.kubernetes.io/name=tfy-llm-gateway
        # Expect 2/2 Ready when proxy.tls is enabled (gateway + proxy containers)
        curl -vk https://gateway.example.com/health
        ```

        If pods crash with `unable to verify the first certificate` when fetching NATS credentials, see the [custom CA](#how-to-configure-custom-ca-certificates) section or the internal HTTP `env` overrides above.
      </Step>
    </Steps>

    <Info>
      **East-west vs north-south TLS:** `proxy.tls` on the gateway sidecar secures traffic **into** the gateway pod from clients. On the control plane, `global.proxy.tls` makes **port 8080 HTTPS** on `tfy-proxy`. Gateway pods must use `CONTROL_PLANE_URL: "{{ .Values.global.controlPlaneURL }}"` (plus `global.customCA` for private CAs), or call `servicefoundry-server` / `tfy-nats` directly over HTTP — not `http://...-tfy-proxy:8080`.
    </Info>
  </Accordion>

  <Accordion title="How to enable and access control plane monitoring (Grafana)?">
    TrueFoundry ships with a built-in monitoring stack that includes Grafana dashboards for the control plane. To enable it, add the following to your `truefoundry-values.yaml`:

    ```yaml truefoundry-values.yaml theme={"dark"}
    truefoundryMonitoring:
      enabled: true
      grafana:
        grafana.ini:
          auth.jwt:
            jwk_set_url: >-
              https://<your-truefoundry-control-plane-url>/api/svc/v1/keys/<tenant-name>/jwks
    ```

    Then upgrade the Helm release to apply the changes:

    ```bash theme={"dark"}
    helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
      -n truefoundry --create-namespace \
      -f truefoundry-values.yaml
    ```

    Once enabled, platform **admins** can access the Grafana dashboard at:

    ```
    https://<your-truefoundry-control-plane-url>/admin/grafana/
    ```

    <Note>
      * Replace `<your-truefoundry-control-plane-url>` with your actual control plane domain (e.g., `app.example.com`) and `<tenant-name>` with your TrueFoundry tenant name provided during onboarding.
      * Only users with the **admin** role can access this endpoint.
      * Make sure to include the trailing `/` at the end of the URL.
      * If you already have Prometheus or VictoriaLogs in your cluster, you can point the monitoring stack to them using `externalServices` instead of installing new instances.
    </Note>

    For the full configuration reference, see the [Control Plane Monitoring](/docs/platform/controlplane-monitoring) guide.
  </Accordion>

  <Accordion title="How do you add default metadata to all requests passing via the gateway?">
    You can attach default metadata to every request that passes through the AI Gateway by setting the `DEFAULT_GATEWAY_METADATA` environment variable on the gateway. The value should be a JSON string of key-value pairs.

    Add the following to your gateway configuration in values file of the gateway plane:

    ```yaml theme={"dark"}
    tfy-llm-gateway:
      env:
        DEFAULT_GATEWAY_METADATA: '{"org":"internal"}'
    ```

    <Note>
      The metadata key-value pairs will be automatically included in every request routed through the gateway. You can use this to tag requests with organizational identifiers, environment labels, or any other metadata your downstream systems need.
    </Note>
  </Accordion>

  <Accordion title="How to expose additional metadata as Prometheus labels for gateway metrics?">
    By default, the AI Gateway exposes a fixed set of Prometheus labels on its metrics. If you want to slice and aggregate gateway metrics by your own metadata fields (e.g. `customer_id`, `request_type`, `environment`), set the `LLM_GATEWAY_METADATA_LOGGING_KEYS` environment variable on the gateway. The value is a JSON-encoded array of metadata keys.

    Each key listed here is exposed as a Prometheus label prefixed with `ai_gateway_metadata_*` — for example, `customer_id` becomes the label `ai_gateway_metadata_customer_id`. You can then use these labels for granular filtering and aggregation in Grafana.

    Add the following to your gateway configuration in values file of the gateway plane:

    ```yaml theme={"dark"}
    tfy-llm-gateway:
      env:
        LLM_GATEWAY_METADATA_LOGGING_KEYS: '["customer_id", "request_type"]'
    ```

    Once the gateway is restarted, requests that include these metadata keys (either via [default metadata](#how-do-you-add-default-metadata-to-all-requests-passing-via-the-gateway) or per-request metadata) will emit Prometheus metrics with the corresponding `ai_gateway_metadata_customer_id` and `ai_gateway_metadata_request_type` labels.

    <Warning>
      Only add metadata keys with bounded, low-cardinality values (e.g. customer tier, request type, environment). Adding high-cardinality keys like user IDs or trace IDs as labels can cause your Prometheus / Victoria Metrics instance to consume excessive memory and storage.
    </Warning>
  </Accordion>

  <Accordion title="How to use HTTPRoute to route traffic using Kubernetes Gateway API?">
    The TrueFoundry Helm charts support the [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) as an alternative to standard `Ingress` resources. Use `HTTPRoute` when your cluster uses a Gateway API-compatible controller (e.g. Envoy Gateway, Istio, NGINX Gateway Fabric, GKE Gateway).

    **Control plane (truefoundry chart)**

    Add the following to your `truefoundry-values.yaml`, setting `parentRefs` to point to your existing `Gateway`:

    ```yaml truefoundry-values.yaml theme={"dark"}
    global:
      httpRoute:
        enabled: true
        parentRefs:
          - name: my-gateway        # Name of your Gateway resource
            namespace: gateway-system  # Namespace where the Gateway is deployed
            sectionName: https      # Listener section on the Gateway (e.g. http or https)
        hostnames:
          - "app.example.com"       # Hostname that this HTTPRoute should match
    ```

    Then apply:

    ```bash theme={"dark"}
    helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
      -n truefoundry --create-namespace \
      -f truefoundry-values.yaml
    ```

    <Note>
      * Only one routing method should be enabled at a time. Disable `global.ingress.enabled` and `global.virtualservice.enabled` when using `httpRoute`.
      * The `sectionName` must match a named listener on your `Gateway` resource. Omit it if your Gateway has a single unnamed listener.
      * TLS termination is handled by the parent `Gateway` — no TLS configuration is needed on the `HTTPRoute` itself.
    </Note>
  </Accordion>

  <Accordion title="How to restrict AWS S3 permissions to a minimal set?">
    By default, the installation instructions use `s3:*` for the S3 bucket IAM policy for simplicity. If your organization requires a least-privilege approach, you can replace `s3:*` with the following minimal set of permissions:

    ```json theme={"dark"}
    {
      "Statement": [
        {
          "Sid": "S3",
          "Effect": "Allow",
          "Action": [
            "s3:ListBucketMultipartUploads",
            "s3:GetBucketTagging",
            "s3:GetObjectVersionTagging",
            "s3:ReplicateTags",
            "s3:PutObjectVersionTagging",
            "s3:ListMultipartUploadParts",
            "s3:PutObject",
            "s3:GetObject",
            "s3:GetObjectAcl",
            "s3:AbortMultipartUpload",
            "s3:PutBucketTagging",
            "s3:GetObjectVersionAcl",
            "s3:GetObjectTagging",
            "s3:PutObjectTagging",
            "s3:GetObjectVersion",
            "s3:ListBucket",
            "s3:DeleteObject"
          ],
          "Resource": [
            "arn:aws:s3:::<YOUR_S3_BUCKET_NAME>",
            "arn:aws:s3:::<YOUR_S3_BUCKET_NAME>/*"
          ]
        }
      ],
      "Version": "2012-10-17"
    }
    ```
  </Accordion>

  <Accordion title="How to configure security context for TrueFoundry components?">
    By default, the TrueFoundry Helm chart ships with container and pod security contexts configured for all components to follow security best practices — pods run as a non-root user (`runAsNonRoot: true`), use a read-only root filesystem (`readOnlyRootFilesystem: true`), and drop all privileges (`capabilities.drop: [ALL]`).

    However, **NATS** (used internally for messaging) does not have these defaults applied automatically. If your cluster enforces [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) (e.g. `restricted` profile) or you want a consistent security posture across all components, you need to explicitly add the security context for NATS by adding the following to your `truefoundry-values.yaml`:

    ```yaml truefoundry-values.yaml theme={"dark"}
    tfyNats:
      container:
        merge:
          securityContext:
            capabilities:
              drop:
                - ALL
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
      podTemplate:
        merge:
          spec:
            securityContext:
              fsGroup: 1000
              runAsUser: 1000
              runAsNonRoot: true
    ```

    <Note>
      The NATS subchart uses a different values structure (`container.merge` and `podTemplate.merge`) compared to other TrueFoundry components. This is because NATS uses its own Helm chart conventions for overriding pod and container specs.
    </Note>
  </Accordion>

  <Accordion title="How to enable Network Policies for Control Plane?">
    * Network policies are **optional** and shipped inside the `truefoundry` Helm chart. They apply only to the release namespace.

    <Warning>
      Network policies are **opt-in** (`networkPolicy.enabled: false` by default). Before enabling in production, add all required sources to `allowedIngressFrom` (monitoring and ingress at minimum). If `allowedIngressFrom` is empty, cross-namespace ingress is blocked — Prometheus scrapes and ingress traffic will fail until you allow those namespaces.
    </Warning>

    ### Prerequisites

    Your cluster CNI must **enforce** `NetworkPolicy` objects. Creating policies in the API is not enough.

    | Platform            | Requirement                                                                                                                                         |
    | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
    | **Amazon EKS**      | VPC CNI add-on with `enableNetworkPolicy: "true"` (v1.14.0+), or Calico/Cilium                                                                      |
    | **GKE**             | Cluster network policy enabled (Calico)                                                                                                             |
    | **AKS**             | Azure CNI with a policy-capable engine (Calico or Cilium)                                                                                           |
    | **OpenShift (OCP)** | **4.5+** with OpenShift SDN, or **4.8+** with OVN-Kubernetes (default CNI from **4.12+**); both enforce `NetworkPolicy` without an extra policy CNI |
    | **Generic**         | Any CNI that enforces `NetworkPolicy` (e.g. Calico, Cilium, Weave Net); vanilla clusters without a policy-capable CNI need one installed            |

    <Info>
      On **OpenShift**, confirm the cluster network plugin with `oc get network.config cluster -o jsonpath='{.spec.networkType}{"\n"}'`. Clusters on **OCP 4.5–4.11** typically report `OpenShiftSDN`; new installs on **OCP 4.12+** default to `OVNKubernetes` (available as an option from 4.8). See [About network policy (OpenShift)](https://docs.openshift.com/container-platform/latest/networking/network_policy/about-network-policy.html).
    </Info>

    <Info>
      On **EKS with the Amazon VPC CNI**, verify the add-on has network policy enforcement enabled before relying on these rules. See [Amazon EKS network policies](https://docs.aws.amazon.com/eks/latest/userguide/cni-network-policy.html).
    </Info>

    ### Policies created

    When `networkPolicy.enabled: true`, the chart creates up to **four** NetworkPolicy objects:

    | Policy                 | Purpose                                                                                         |
    | ---------------------- | ----------------------------------------------------------------------------------------------- |
    | `default-deny-ingress` | Block all ingress into control-plane pods                                                       |
    | `allow-all-egress`     | Allow all egress from control-plane pods                                                        |
    | `intra-instance`       | Allow ingress between pods with `app.kubernetes.io/instance: <release-name>`                    |
    | `ingress-external`     | Allow ingress from namespaces listed in `allowedIngressFrom` (only when that list is non-empty) |

    <Steps>
      <Step title="Add network policy settings to your Helm values">
        Add the following to `truefoundry-values.yaml`. Replace namespace names with the ones used in **your** cluster.

        **Minimal enable** (in-release traffic only — no Prometheus or ingress from other namespaces yet):

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        networkPolicy:
          enabled: true
          allowedIngressFrom: []
        ```

        **Typical production configuration** (monitoring + ingress controller):

        ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
        networkPolicy:
          enabled: true
          allowedIngressFrom:
            # All pods in the monitoring namespace (Prometheus scrapes)
            - namespace: tfy-prometheus
            # Only ingress controller pods (recommended)
            - namespace: ingress-nginx
              podSelector:
                app.kubernetes.io/name: ingress-nginx
            # Optional: Istio ingress gateway
            # - namespace: istio-system
            #   podSelector:
            #     app: istio-ingressgateway
        ```

        Each `allowedIngressFrom` entry requires `namespace`. Omit `podSelector` to allow **all pods** in that namespace; set `podSelector` to allow **only matching pods** (recommended for ingress controllers).

        <Note>
          Cross-namespace sources are combined in a **single** `ingress-external` policy. In-namespace microservice traffic uses the label `app.kubernetes.io/instance: <helm-release-name>` (for example `app.kubernetes.io/instance: truefoundry`).
        </Note>
      </Step>

      <Step title="Upgrade the Helm release">
        ```bash wrap lines theme={"dark"}
        helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
          -n truefoundry --create-namespace \
          -f truefoundry-values.yaml
        ```
      </Step>

      <Step title="Verify NetworkPolicies are applied">
        ```bash wrap lines theme={"dark"}
        kubectl get networkpolicy -n truefoundry
        kubectl describe networkpolicy -n truefoundry
        ```

        You should see three policies when `allowedIngressFrom` is empty, or four when cross-namespace sources are configured.
      </Step>

      <Step title="Validate connectivity">
        After enabling, confirm:

        * Control plane UI loads via ingress
        * Prometheus scrape targets for control-plane services are **UP**
        * Pods in the same Helm release can reach each other
        * External dependencies still work (RDS, S3, cloud APIs) — egress is allow-all by default

        **In-release connectivity example** (adjust service name if needed):

        ```bash wrap lines theme={"dark"}
        kubectl exec -n truefoundry deploy/truefoundry-mlfoundry-server -- \
          curl -sf http://truefoundry-servicefoundry-server:3000/health
        ```

        **Negative test** (should fail — traffic from an unlisted namespace):

        ```bash wrap lines theme={"dark"}
        kubectl run np-test -n default --rm -it --image=curlimages/curl -- \
          curl -m 5 http://truefoundry-servicefoundry-server.truefoundry.svc.cluster.local:3000/
        ```
      </Step>
    </Steps>

    ### Troubleshooting

    | Symptom                      | Likely cause                                         | Fix                                                                        |
    | ---------------------------- | ---------------------------------------------------- | -------------------------------------------------------------------------- |
    | Prometheus targets down      | Monitoring namespace not in `allowedIngressFrom`     | Add your Prometheus namespace (e.g. `tfy-prometheus`)                      |
    | Ingress 502 / timeout        | Ingress namespace not allowed or wrong `podSelector` | Add ingress namespace with labels matching your controller                 |
    | Policies exist but no effect | CNI does not enforce NetworkPolicy                   | Enable enforcement on your cluster (EKS: `enableNetworkPolicy` on vpc-cni) |
    | Pods cannot reach each other | Wrong or missing `app.kubernetes.io/instance` label  | Verify pod labels match Helm release name                                  |

    <Info>
      Network policies complement — but do not replace — TLS, authentication, ingress WAF, and cloud security groups. They are scoped to the control-plane namespace only; other namespaces are not modified.
    </Info>
  </Accordion>

  <Accordion title="Why are global tolerations and affinity not applied to NATS pods?">
    The `global.tolerations` value in the `truefoundry` Helm chart is applied to all TrueFoundry components (servicefoundry server, mlfoundry server, tfy-proxy, etc.). However, **NATS** is deployed via the upstream [NATS Helm chart](https://github.com/nats-io/k8s) as a subchart, which does not read `global.tolerations`. So if you have set global tolerations to schedule the control plane on tainted nodes, the NATS pods will not get them and may remain in `Pending` state.

    The same applies to **`affinity`** and **`nodeSelector`** — `global.affinity` is not passed to the NATS subchart either, so NATS pods get an empty `affinity`/`nodeSelector` by default and will not follow the node-affinity rules used by the rest of the control plane.

    Set `tolerations`, `affinity`, and `nodeSelector` explicitly under `tfyNats` using the NATS chart's `podTemplate.patch` (a JSON Patch) in your `truefoundry-values.yaml`:

    ```yaml truefoundry-values.yaml wrap expandable lines theme={"dark"}
    global:
      tolerations:
        - key: class.truefoundry.io/control-plane
          effect: NoSchedule
          operator: Exists

    # global.tolerations / global.affinity do not flow to the NATS subchart —
    # set them explicitly on the NATS pod template via podTemplate.patch.
    tfyNats:
      podTemplate:
        patch:
          # Keep these default entries from the chart as-is — the patch list
          # replaces the chart default entirely, it is not merged.
          - op: add
            path: /spec/volumes/-
            value:
              name: resolver-volume
              secret:
                secretName: tfy-nats-accounts
                defaultMode: 420
          - op: replace
            path: /spec/volumes/1
            value:
              name: pid
              emptyDir:
                sizeLimit: "256Mi"
          - op: add
            path: /spec/imagePullSecrets
            value: []
          # Tolerations — match the taints on your control-plane nodepool
          - op: add
            path: /spec/tolerations
            value:
              - key: class.truefoundry.io/control-plane
                effect: NoSchedule
                operator: Exists
          # Affinity — match the labels on your control-plane nodes
          - op: add
            path: /spec/affinity
            value:
              nodeAffinity:
                requiredDuringSchedulingIgnoredDuringExecution:
                  nodeSelectorTerms:
                    - matchExpressions:
                        - key: <YOUR_NODE_LABEL_KEY>      # e.g. the label on your control-plane nodepool
                          operator: In
                          values:
                            - <YOUR_NODE_LABEL_VALUE>
          # nodeSelector — leave empty unless you specifically need one
          - op: add
            path: /spec/nodeSelector
            value: {}
    ```

    Then upgrade the Helm release to apply the changes:

    ```bash wrap lines theme={"dark"}
    helm upgrade --install truefoundry oci://tfy.jfrog.io/tfy-helm/truefoundry \
      -n truefoundry --create-namespace -f truefoundry-values.yaml
    ```

    <Warning>
      Use `podTemplate.patch` (not `podTemplate.merge`) for `tolerations`, `affinity`, and `nodeSelector`. The NATS subchart applies `podTemplate.merge` **first** and then the JSON `podTemplate.patch`, and the chart's default patch sets `/spec/tolerations`, `/spec/affinity`, and `/spec/nodeSelector` to empty values — so anything set via `podTemplate.merge.spec.tolerations` / `.affinity` is overwritten and silently has no effect.

      Because Helm **replaces arrays wholesale**, the `patch` list above must include the chart's default entries (`resolver-volume`, `pid`, `imagePullSecrets`) in addition to your scheduling values. Omitting them will break NATS (for example the accounts resolver volume will be missing).
    </Warning>
  </Accordion>

  <Accordion title="Why are volumes (PVCs) stuck in Pending state?">
    The control plane creates PersistentVolumeClaims for **NATS JetStream** (and **PostgreSQL** when `devMode` is enabled) without specifying a storage class. These PVCs rely on the cluster having a **default StorageClass**. If no default is set, the PVCs stay in `Pending` state and the corresponding pods (e.g., `truefoundry-tfy-nats-0`) never start.

    **1. Check the PVC status and events:**

    ```bash wrap lines theme={"dark"}
    kubectl get pvc -n truefoundry
    kubectl describe pvc <pvc-name> -n truefoundry
    ```

    If the events show messages like `no storage class is set` or `waiting for a volume to be created`, the cluster is missing a default storage class or a provisioner.

    **2. Check if a default storage class exists:**

    ```bash wrap lines theme={"dark"}
    kubectl get storageclass
    ```

    One of the storage classes should be marked with `(default)`. If none is, mark one as default:

    ```bash wrap lines theme={"dark"}
    kubectl patch storageclass <storage-class-name> \
      -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
    ```

    <Note>
      * On **AWS EKS**, ensure the [EBS CSI driver addon](https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html) is installed — without it, PVCs will not be provisioned even if a storage class exists.
      * On **GKE** and **AKS**, a default storage class is usually present out of the box.
      * Only block storage with `ReadWriteOnce` access is required — see [Do we need any NFS volumes?](#do-we-need-any-nfs-volumes-in-kubernetes-for-the-ai-gateway-or-control-plane)
    </Note>

    **3. Recreate the stuck PVCs:**

    Setting a default storage class does **not** retroactively apply to already-created PVCs — the default is injected only at PVC creation time. Delete the `Pending` PVCs and their pods so they get recreated with the default storage class:

    ```bash wrap lines theme={"dark"}
    kubectl delete pvc <pvc-name> -n truefoundry
    kubectl delete pod <pod-name> -n truefoundry
    ```

    Alternatively, you can specify the storage class explicitly instead of relying on the cluster default. For the NATS JetStream PVC:

    ```yaml truefoundry-values.yaml wrap lines theme={"dark"}
    tfyNats:
      config:
        jetstream:
          fileStore:
            pvc:
              storageClassName: <storage-class-name>
    ```
  </Accordion>

  <Accordion title="How to deploy on OpenShift with restricted Security Context Constraints (SCC)?">
    OpenShift clusters enforce [Security Context Constraints (SCCs)](https://docs.openshift.com/container-platform/latest/authentication/managing-security-context-constraints.html) that expect pods to have an **empty security context** so that OpenShift can inject arbitrary user and group IDs at runtime. By default, the TrueFoundry Helm chart sets explicit `podSecurityContext` and `securityContext` values (such as `runAsUser`, `runAsNonRoot`, `fsGroup`, etc.) on its components, which conflicts with the `restricted` or `restricted-v2` SCC.

    To resolve this, disable both pod-level and container-level security contexts for all components by adding the following overrides to your `truefoundry-values.yaml`:

    ```yaml truefoundry-values.yaml theme={"dark"}
    # Disable security contexts for OpenShift SCC compatibility
    truefoundryBootstrap:
      podSecurityContext:
        enabled: false

    mlfoundryServer:
      podSecurityContext:
        enabled: false

    servicefoundryServer:
      podSecurityContext:
        enabled: false

    tfyK8sController:
      podSecurityContext:
        enabled: false

    tfyProxy:
      podSecurityContext:
        enabled: false

    deltaFusionIngestor:
      podSecurityContext:
        enabled: false

    deltaFusionCompaction:
      podSecurityContext:
        enabled: false

    deltaFusionQueryServer:
      podSecurityContext:
        enabled: false

    tfy-llm-gateway:
      podSecurityContext:
        enabled: false

    tfy-otel-collector:
      podSecurityContext:
        enabled: false
    ```

    Setting `enabled: false` removes all explicit security context fields from the pod and container specs, allowing OpenShift's SCC admission controller to assign user and group IDs as needed.
  </Accordion>

  <Accordion title="How to install Vector for log collection on OpenShift (OCP)?">
    The `tfy-logs` chart ships [Vector](https://vector.dev/) as a `DaemonSet` that tails container logs from each node's host filesystem and ships them to VictoriaLogs. Vector writes its checkpoint/snapshot state (the record of how far it has read in each log file) to a `hostPath` data directory on the node. The chart default points to a host directory that is **not writable on RHCOS**, so Vector cannot persist its checkpoints and the pod fails to start or restarts without retaining read positions. You must point this at a writable location on the node.

    On RHCOS the writable, persistent location is under `/var/home/core` (other paths such as `/var/lib` are managed and read-only for containers). Set `persistence.hostPath.path` to a writable directory there, for example `/var/home/core/data/vector`. If you mirror images into a private registry (common in air-gapped OpenShift clusters), also override the registry for `victoria-logs-single` and Vector — see [Can I use my Artifactory as a mirror to pull images?](#can-i-use-my-artifactory-as-a-mirror-to-pull-images).

    Because Vector runs as a `DaemonSet` that mounts the node's host filesystem (`hostPath`) to read container logs, its service account must be granted the `privileged` SCC. Without this, OpenShift's SCC admission controller blocks the pods and the DaemonSet will not start. Grant the SCC to the `tfy-logs-vector` service account in the `tfy-logs` namespace:

    ```bash wrap lines theme={"dark"}
    oc adm policy add-scc-to-user privileged -z tfy-logs-vector -n tfy-logs
    ```

    On SELinux-enforcing nodes (RHCOS), the default container SELinux context (`container_t`) cannot read the host log files under `/var/log`. Set the pod's SELinux type to `spc_t` (super-privileged container) so Vector is allowed to read them — otherwise the pod runs but collects no logs (permission denied).

    ```yaml tfy-logs-values.yaml wrap lines theme={"dark"}
    victoria-logs-single:
      enabled: true
      # Optional: only needed if you pull images from a private registry / mirror
      global:
        image:
          registry: <YOUR_REGISTRY>
      server:
        image:
          registry: <YOUR_REGISTRY>
      vector:
        enabled: true
        # Required on SELinux-enforcing nodes (RHCOS) so Vector can read host log files
        podSecurityContext:
          seLinuxOptions:
            type: spc_t
        # Optional: only needed if you pull images from a private registry / mirror
        image:
          repository: <YOUR_REGISTRY>/timberio/vector
        persistence:
          hostPath:
            enabled: true
            # Must be a writable location on the node. On RHCOS use a path under /var/home/core.
            path: /var/home/core/data/vector

    # Vector for Windows nodes is not applicable on OpenShift
    windowsVector:
      enabled: false
    ```

    <Tip>
      The directory is created automatically on each node by the DaemonSet. If your cluster uses a different writable mount (for example a dedicated data partition), set `path` to a writable directory on that mount instead — the value only needs to be writable by the Vector pod on every node.
    </Tip>

    After applying the values, verify the Vector DaemonSet is running on every node:

    ```bash wrap lines theme={"dark"}
    kubectl -n tfy-logs rollout status daemonset/<release-name>-vector
    kubectl -n tfy-logs logs -l app.kubernetes.io/name=vector --tail=50
    ```
  </Accordion>

  <Accordion title="How to deploy the control plane on AWS EKS with CloudWatch Observability addon installed?">
    When the EKS CloudWatch Observability addon is enabled, its ADOT auto-instrumentation injects bundled Python libraries via `PYTHONPATH` that conflict with `truefoundry-mlfoundry-server` dependencies, causing the pod to enter `CrashLoopBackOff`. You may see errors like:

    ```wrap theme={"dark"}
    ImportError: cannot import name 'DEFAULT_CIPHERS' from 'urllib3.util.ssl_'
      (/otel-auto-instrumentation-python/urllib3/util/ssl_.py)
    ```

    ```wrap theme={"dark"}
    ImportError: cannot import name 'LogData' from 'opentelemetry.sdk._logs'
      (/otel-auto-instrumentation-python/opentelemetry/sdk/_logs/__init__.py)
    ```

    To fix this, exclude the `truefoundry` namespace from the addon's auto-instrumentation by updating the addon configuration:

    ```json theme={"dark"}
    {
      "manager": {
        "applicationSignals": {
          "autoMonitor": {
            "exclude": {
              "python": { "namespaces": ["truefoundry"] },
              "java": { "namespaces": ["truefoundry"] },
              "nodejs": { "namespaces": ["truefoundry"] },
              "dotnet": { "namespaces": ["truefoundry"] }
            }
          }
        },
        "autoAnnotateAutoInstrumentation": {
          "python": { "namespaces": [] },
          "java": { "namespaces": [] },
          "nodejs": { "namespaces": [] },
          "dotnet": { "namespaces": [] }
        }
      }
    }
    ```

    <Note>
      After updating the addon config, restart the deployment and verify the pods are running:

      ```bash theme={"dark"}
      kubectl rollout restart deployment truefoundry-mlfoundry-server -n truefoundry
      ```
    </Note>
  </Accordion>

  <Accordion title="How to enable the stdio MCP proxy?">
    The stdio MCP proxy enables TrueFoundry to run MCP servers that use the stdio transport protocol. It is disabled by default.

    To enable it, add the following to your `truefoundry-values.yaml`:

    ```yaml truefoundry-values.yaml theme={"dark"}
    stdioMcpProxy:
      enabled: true
    ```

    Restart the `truefoundry-tfy-proxy` deployment as well

    ```yaml theme={"dark"}
    kubectl rollout restart deployment truefoundry-tfy-proxy -n truefoundry
    ```
  </Accordion>
</AccordionGroup>
