> ## Documentation Index
> Fetch the complete documentation index at: https://www.truefoundry.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Migrating AWS Terraform Provider from v5 to v6

> Step-by-step guide to upgrade TrueFoundry AWS Terraform modules from AWS provider v5 to v6 with zero downtime.

This guide walks you through migrating TrueFoundry AWS infrastructure modules from AWS Terraform provider v5 to v6. The main module has submodules with dependencies that must be updated in a specific order to avoid downtime.

The migration is performed in phases. Karpenter requires special handling through an intermediate version to ensure a seamless transition from IRSA to Pod Identity.

## Prerequisites

Before starting the migration, ensure the following:

* <Icon icon="square-check" iconType="regular" /> **OpenTofu 1.10+** is installed (recommended). Terraform also works if OpenTofu is not available.
* <Icon icon="square-check" iconType="regular" /> **AWS CLI** is installed and configured with appropriate credentials.
* <Icon icon="square-check" iconType="regular" /> Your modules are currently on the versions listed in the **From** column of the [Module Version Reference](#module-version-reference) table below.
* <Icon icon="square-check" iconType="regular" /> You have sufficient IAM permissions to manage EKS, IAM roles, SQS, and related resources.
* <Icon icon="square-check" iconType="regular" /> Your current infrastructure has a **clean plan** -- run `tofu plan` (or `terraform plan`) and confirm it shows no pending changes before starting. If there is drift between your state and your infrastructure, resolve it first to avoid mixing unrelated changes into the migration.

<Warning>
  Back up your OpenTofu/Terraform state before starting this migration. If your state is stored in S3, ensure you can recover a previous version if needed.
</Warning>

<Note>
  We recommend confirming that [S3 bucket versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) is enabled on your OpenTofu/Terraform state bucket as a best practice before beginning. This allows you to recover any prior state file version if something goes wrong.
</Note>

## Module Version Reference

The following table summarizes the version changes for each module. Modules marked with **\*** have an intermediate version that must be applied before the final upgrade.

| Module                        | From (starting) | Intermediate | To (v6 compatible) |
| ----------------------------- | --------------- | ------------ | ------------------ |
| network                       | v0.3.10         | --           | v0.4.1             |
| eks **\***                    | v0.7.20         | v0.7.21      | v0.8.3             |
| efs                           | v0.4.5          | --           | v0.5.2             |
| aws-load-balancer-controller  | v0.1.5          | --           | v0.2.1             |
| karpenter **\***              | v0.3.12         | v0.3.13      | v0.4.3             |
| tfy-platform-features         | v0.4.13         | --           | v0.5.0             |
| control-plane (if applicable) | v0.4.24         | --           | v0.5.1             |

<Warning>
  Before you begin the migration, verify that your configuration is at a <b>clean plan</b> state with all modules pinned to the starting versions listed above. Run <code>tofu plan</code> (or <code>terraform plan</code>) and confirm that there are no pending changes before proceeding.
</Warning>

## Karpenter Upgrade Strategy

Upgrading Karpenter requires releasing an intermediate version before the final release to ensure zero downtime. The migration transitions Karpenter from IRSA to Pod Identity.

**How the phased upgrade works:**

1. Version **v0.3.13** is deployed first. It creates new resources (SQS queue, IAM role with Pod Identity) that run simultaneously alongside the older resources.
2. The Karpenter Helm chart values are updated to point to the newly created resources.
3. A `disable_old_changes` flag controls the cleanup of old resources. When set to `true`, the older IRSA-based resources are removed.
4. Version **v0.4.3** is the final release that is fully AWS provider v6 compatible.

<Accordion title="Resource transition details">
  The following table shows how each Karpenter-managed resource transitions during the migration:

  | Resource            | Old (`disable_old_changes = false`) | New (`disable_old_changes = true`)    |
  | ------------------- | ----------------------------------- | ------------------------------------- |
  | SQS queue           | `<cluster_name>-karpenter`          | `<cluster_name>-karpenter-queue`      |
  | Controller IAM role | `<cluster_name>-karpenter`          | `<cluster_name>-karpenter-controller` |
  | Role trust          | IRSA only                           | Pod Identity                          |
  | Instance profile    | `<cluster_name>-karpenter-initial`  | Same (unchanged, in-place update)     |
  | CloudWatch rules    | Managed individually                | Managed by sub-module                 |
  | IRSA module         | `module.karpenter_irsa_role[0]`     | Removed                               |
</Accordion>

## Migration Steps

<Warning>
  Before applying any step, always run `tofu plan` (or `terraform plan`) and carefully review the output. If you see unexpected resource deletions, stop and investigate before proceeding.
</Warning>

<Steps>
  <Step title="Pin modules to intermediate versions">
    Ensure all modules are at the following versions before proceeding. If any module is on an older version, update it to the version shown here and apply first.

    | Module                        | Version |
    | ----------------------------- | ------- |
    | network                       | v0.3.10 |
    | eks                           | v0.7.21 |
    | efs                           | v0.4.5  |
    | aws-load-balancer-controller  | v0.1.5  |
    | karpenter                     | v0.3.13 |
    | tfy-platform-features         | v0.4.13 |
    | control-plane (if applicable) | v0.4.24 |

    <Note>
      Run `tofu plan` (or `terraform plan`) and review the output before applying. Verify no unexpected changes are shown.
    </Note>
  </Step>

  <Step title="Prepare EKS and Karpenter modules">
    This step prepares the EKS cluster for Pod Identity and sets up the intermediate Karpenter version.

    **1. Update the cluster module**

    Remove the `node_security_group_additional_rules` block from the cluster module and bump it to **v0.7.21** to install the EKS Pod Identity Agent:

    ```hcl theme={"dark"}
    # Remove this entire block from your cluster module configuration
    node_security_group_additional_rules = {
      "ingress_control_plane_all" = {
        "description" = "Control plane to node all ports/protocols"
        "protocol"    = "-1"
        "from_port"   = 0
        "to_port"     = 0
        "type"        = "ingress"
        "cidr_blocks" = "${module.network.private_subnets_cidrs}"
      }
    }
    ```

    **2. Update the Karpenter module**

    Move the Karpenter module to **v0.3.13** with the following settings:

    ```hcl theme={"dark"}
    module "karpenter" {
      source  = "truefoundry/truefoundry-karpenter/aws"
      version = "0.3.13"

      # Keep disable_old_changes as false to create new resources
      # alongside old ones during the transition
      disable_old_changes                              = false
      karpenter_iam_role_policy_prefix_enable_override = false

      # ... your existing configuration ...
    }
    ```

    <Note>
      Run `tofu plan` (or `terraform plan`) and review the output. You should see new resources being created (new SQS queue, new IAM role with Pod Identity) while existing resources remain unchanged.
    </Note>

    Apply the changes after reviewing the plan.
  </Step>

  <Step title="Update Karpenter Helm chart values">
    <Note>
      These changes apply to the **Karpenter** Helm chart values, **not** the Karpenter Config Helm chart values.
    </Note>

    **1. Remove the `serviceAccount` annotations**

    Find and remove the following lines from your Karpenter Helm chart values:

    ```yaml theme={"dark"}
    # Remove these lines
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/<KARPENTER_ROLE>
    ```

    This annotation is no longer needed because Karpenter will now use Pod Identity instead of IRSA for authentication.

    **2. Update the `interruptionQueue` name**

    Append `-queue` to the end of the `interruptionQueue` value in your Karpenter Helm chart values:

    ```yaml theme={"dark"}
    # Before
    settings:
      interruptionQueue: <cluster_name>-karpenter

    # After
    settings:
      interruptionQueue: <cluster_name>-karpenter-queue
    ```

    **3. Submit the Helm chart changes**

    <Warning>
      You must submit and apply these Helm chart changes before proceeding to the next step. The Karpenter pods will restart and begin using the new SQS queue and Pod Identity. Verify that Karpenter pods are running and healthy before continuing.
    </Warning>
  </Step>

  <Step title="Clean up old Karpenter resources">
    Run the Karpenter OpenTofu/Terraform module with `disable_old_changes = true` to remove the old IRSA-based resources:

    ```hcl theme={"dark"}
    module "karpenter" {
      source  = "truefoundry/truefoundry-karpenter/aws"
      version = "0.3.13"

      disable_old_changes                          = true
      karpenter_iam_role_policy_prefix_enable_override = false

      # ... your existing configuration ...
    }
    ```

    <Note>
      Run `tofu plan` (or `terraform plan`) and review the output. You should see the old IRSA module, old SQS queue, and old CloudWatch rules being destroyed. The new resources created in the previous step should remain untouched.
    </Note>

    Apply the changes after confirming the plan only removes old resources.
  </Step>

  <Step title="Upgrade modules to final v6-compatible versions">
    Now upgrade all modules to their final AWS provider v6-compatible versions.

    **1. Version-only bumps**

    Update the following modules to their new versions. These require only a version change with no other configuration modifications:

    ```hcl theme={"dark"}
    # Network
    version = "0.4.1"

    # Cluster (EKS)
    version = "0.8.3"

    # Platform Features
    version = "0.5.0"

    # Control Plane (if applicable)
    version = "0.5.1"
    ```

    <Warning>
      The cluster module's default `cluster_version` is **1.35** as of v0.8.3 (it was 1.34 in earlier releases). If you do **not** pin `cluster_version` explicitly in your configuration, bumping the module will trigger an in-place Kubernetes control-plane upgrade alongside the provider migration. Pin `cluster_version` to your current version to keep this a version-only bump, and upgrade Kubernetes separately afterward.
    </Warning>

    **2. EFS module**

    The EFS module now requires the `cluster_oidc_issuer_arn` input:

    ```hcl theme={"dark"}
    module "efs" {
      # ... existing configuration ...
      version                 = "0.5.2"
      cluster_oidc_issuer_arn = module.eks.oidc_provider_arn  # add this line
      # ...
    }
    ```

    **3. EBS module**

    <Note>
      If you are updating the module source to `terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts`, note that the required input variables may differ from previous versions.

      Carefully review and update your module configuration to match the example provided below, and adjust variable references as needed.

      The EBS module requires `use_name_prefix = false` to prevent the IAM role from being recreated, and a `policy_name` parameter.
    </Note>

    ```hcl theme={"dark"}
    module "ebs" {
      source              = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts"
      version             = "6.4.0"
      create              = true
      use_name_prefix     = false  # add this to prevent role recreation
      name                = "${var.cluster_name}-csi-ebs"
      policy_name         = "${var.cluster_name}-csi-ebs-policy"  # add this line
      attach_ebs_csi_policy = true
      oidc_providers = {
        ebs = {
          provider_arn               = module.eks.oidc_provider_arn
          namespace_service_accounts = ["aws-ebs-csi-driver:ebs-csi-controller-sa"]
        }
      }
      tags = var.tags
    }
    ```

    **4. Karpenter module (final version)**

    Upgrade Karpenter to the final v0.4.3 release. The module configuration is simplified since the migration is now complete:

    <Note>
      Remove `oidc_provider_arn = module.cluster.oidc_provider_arn` from the `karpenter` module definition.
    </Note>

    <Note>
      As of v0.4.3, the Karpenter controller uses an **inline** IAM policy by default (`karpenter_enable_inline_policy = true`), which avoids the `PolicySize: 6144` quota error. If you are coming from a configuration that used a managed policy, `tofu plan` will show the controller's IAM policy changing form (managed policy removed, inline policy added on the role). The permissions are equivalent -- this diff is expected and safe to apply.
    </Note>

    ```hcl theme={"dark"}
    module "karpenter" {
      source                       = "truefoundry/truefoundry-karpenter/aws"
      version                      = "0.4.3"
      cluster_name                 = var.cluster_name
      controller_node_iam_role_arn = var.use_existing_cluster ? var.existing_cluster_node_role_arn : module.eks.eks_managed_node_groups.initial.iam_role_arn
      controller_nodegroup_name    = "initial"
      tags                         = var.tags
    }
    ```

    **5. AWS Load Balancer Controller module**

    Upgrade the AWS Load Balancer Controller module to the final v0.2.1 release. The module configuration is simplified since the migration is now complete:

    ```hcl theme={"dark"}
    module "aws-load-balancer-controller" {
      # ... existing configuration ...
      version = "0.2.1"
      elb_controller_use_name_prefix = false
      # ...
    }
    ```

    **6. TrueFoundry module**

    Update the TrueFoundry module to reference the new EBS IAM role ARN:

    ```hcl theme={"dark"}
    "awsEbsCsiDriver" = {
      "enabled" = true
      "roleArn" = "${module.ebs.arn}"
    }
    ```

    <Note>
      Run `tofu plan` (or `terraform plan`) and review the output carefully. If you see any unexpected resource deletions, investigate before applying. The `use_name_prefix = false` addition in the EBS module is specifically to avoid an unnecessary role recreation.
    </Note>

    Apply all changes after reviewing the plan.
  </Step>

  <Step title="Update Karpenter Helm chart version">
    **After all OpenTofu/Terraform changes have been applied**, update the <code>tfy-karpenter</code> Helm chart to version <b>0.5.11</b>.

    <Note>
      This step ensures that the Karpenter deployment is using the compatible chart release after the infrastructure migration is complete. Make sure to upgrade only after applying all previous changes.
    </Note>
  </Step>

  <Step title="Post-migration validation">
    After completing all upgrade steps, verify that everything is working correctly.

    **1. Verify Karpenter pods are healthy**

    ```bash theme={"dark"}
    kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
    ```

    All Karpenter pods should be in `Running` status with all containers ready.

    **2. Verify nodes can be provisioned**

    ```bash theme={"dark"}
    kubectl get nodeclaims
    ```

    Check that existing NodeClaims are in a healthy state. If you have pending pods that require new nodes, verify that Karpenter provisions them.

    **3. Verify Pod Identity associations**

    ```bash theme={"dark"}
    aws eks list-pod-identity-associations --cluster-name <cluster_name>
    ```

    Confirm that a Pod Identity association exists for the Karpenter service account.

    **4. Verify all module resources**

    ```bash theme={"dark"}
    tofu plan
    ```

    Run a final plan to confirm no further changes are pending. The output should show `No changes. Your infrastructure matches the configuration.`
  </Step>
</Steps>

## Rollback

If you encounter issues during the migration, you can revert to the previous state.

**Before Step 3 (old resources still exist):**

1. Revert the Karpenter Helm chart values to restore the `serviceAccount.annotations` and original `interruptionQueue` name (without the `-queue` suffix).
2. Revert the Karpenter module version to **v0.3.12** in your `.tf` files.
3. Revert the cluster module to restore the `node_security_group_additional_rules` block and set it back to **v0.7.20** if needed.
4. Run `tofu plan` (or `terraform plan`) to confirm the rollback scope, then apply.

**After Step 3 (old resources have been removed):**

1. Set `disable_old_changes = false` on the Karpenter module (still at v0.3.13) and apply to recreate the old resources.
2. Revert the Karpenter Helm chart values to restore the `serviceAccount.annotations` and original `interruptionQueue` name.
3. Once Karpenter is healthy with the old resources, revert the module version to **v0.3.12** and apply.

<Note>
  The phased Karpenter upgrade is designed so that you can revert the Helm chart values at any point before Step 3 to fall back to the old IRSA-based resources without disruption.
</Note>

## Troubleshooting

<AccordionGroup>
  <Accordion title="Karpenter pods crashlooping after cleaning up old resources">
    If Karpenter pods are crashlooping after Step 3, verify that:

    1. The Karpenter Helm chart values were updated **before** running Step 3. The `serviceAccount.annotations` should be removed and `interruptionQueue` should have the `-queue` suffix.
    2. The Pod Identity association was created successfully. Check with:
       ```bash theme={"dark"}
       aws eks list-pod-identity-associations --cluster-name <cluster_name>
       ```
    3. The new SQS queue exists:
       ```bash theme={"dark"}
       aws sqs get-queue-url --queue-name <cluster_name>-karpenter-queue
       ```

    If the Helm values were not updated before Step 3, revert `disable_old_changes` to `false` and apply to recreate the old resources, then follow the steps in order.
  </Accordion>

  <Accordion title="OpenTofu/Terraform plan shows unexpected resource deletions">
    If `tofu plan` (or `terraform plan`) shows resources being destroyed that you do not expect, do **not** apply. Common causes include:

    * **IAM role recreation:** Ensure the EBS module has `use_name_prefix = false` set. Without this, the role name gets a random suffix and OpenTofu/Terraform sees it as a new resource.
    * **State drift:** If resources were modified outside of OpenTofu/Terraform, the plan may show unexpected changes. Run `tofu refresh` (or `terraform refresh`) to sync state before re-running the plan.
    * **Module source changes:** Verify all module `source` and `version` fields match the values in this guide exactly.
  </Accordion>

  <Accordion title="SQS queue name mismatch or interruption handler not working">
    If Karpenter is not processing spot interruption events:

    1. Confirm the `interruptionQueue` value in the Karpenter Helm chart matches the actual SQS queue name. After migration, it should be `<cluster_name>-karpenter-queue`.
    2. Verify the queue exists and has the correct permissions:
       ```bash theme={"dark"}
       aws sqs get-queue-attributes --queue-url <queue_url> --attribute-names All
       ```
    3. Check Karpenter logs for queue-related errors:
       ```bash theme={"dark"}
       kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i "sqs\|queue\|interruption"
       ```
  </Accordion>

  <Accordion title="Pod Identity not taking effect">
    If Karpenter is unable to assume its IAM role after the migration:

    1. Verify the EKS Pod Identity Agent addon is installed and running:
       ```bash theme={"dark"}
       kubectl get pods -n kube-system -l app.kubernetes.io/name=eks-pod-identity-agent
       ```
    2. Confirm the Pod Identity association exists:
       ```bash theme={"dark"}
       aws eks list-pod-identity-associations --cluster-name <cluster_name>
       ```
    3. Restart the Karpenter pods to pick up the Pod Identity credentials:
       ```bash theme={"dark"}
       kubectl rollout restart deployment -n kube-system karpenter
       ```
    4. If the EKS Pod Identity Agent is missing, verify that the cluster module was upgraded to **v0.7.21** or later in Step 1, which installs this addon.
  </Accordion>
</AccordionGroup>
