Skip to main content
This guide walks you through migrating TrueFoundry AWS infrastructure modules from AWS Terraform provider v5 to v6. The main module has submodules with dependencies that must be updated in a specific order to avoid downtime. The migration is performed in phases. Karpenter requires special handling through an intermediate version to ensure a seamless transition from IRSA to Pod Identity.

Prerequisites

Before starting the migration, ensure the following:
  • OpenTofu 1.10+ is installed (recommended). Terraform also works if OpenTofu is not available.
  • AWS CLI is installed and configured with appropriate credentials.
  • Your modules are currently on the versions listed in the From column of the Module Version Reference table below.
  • You have sufficient IAM permissions to manage EKS, IAM roles, SQS, and related resources.
  • Your current infrastructure has a clean plan — run tofu plan (or terraform plan) and confirm it shows no pending changes before starting. If there is drift between your state and your infrastructure, resolve it first to avoid mixing unrelated changes into the migration.
Back up your OpenTofu/Terraform state before starting this migration. If your state is stored in S3, ensure you can recover a previous version if needed.
We recommend confirming that S3 bucket versioning is enabled on your OpenTofu/Terraform state bucket as a best practice before beginning. This allows you to recover any prior state file version if something goes wrong.

Module Version Reference

The following table summarizes the version changes for each module. Modules marked with * have an intermediate version that must be applied before the final upgrade.
ModuleFromIntermediateTo (v6 compatible)
networkv0.3.10v0.4.0
eks *v0.7.20v0.7.21v0.8.1
efsv0.4.5v0.5.0
aws-load-balancer-controllerv0.1.5v0.2.0
karpenter *v0.3.12v0.3.13v0.4.0
tfy-platform-featuresv0.4.13v0.5.0
control-plane (if applicable)v0.4.24v0.5.0

Karpenter Upgrade Strategy

Upgrading Karpenter requires releasing an intermediate version before the final release to ensure zero downtime. The migration transitions Karpenter from IRSA to Pod Identity. How the phased upgrade works:
  1. Version v0.3.13 is deployed first. It creates new resources (SQS queue, IAM role with Pod Identity) that run simultaneously alongside the older resources.
  2. The Karpenter Helm chart values are updated to point to the newly created resources.
  3. A disable_old_changes flag controls the cleanup of old resources. When set to true, the older IRSA-based resources are removed.
  4. Version v0.4.0 is the final release that is fully AWS provider v6 compatible.
The following table shows how each Karpenter-managed resource transitions during the migration:
ResourceOld (disable_old_changes = false)New (disable_old_changes = true)
SQS queue<cluster_name>-karpenter<cluster_name>-karpenter-queue
Controller IAM role<cluster_name>-karpenter<cluster_name>-karpenter-controller
Role trustIRSA onlyPod Identity
Instance profile<cluster_name>-karpenter-initialSame (unchanged, in-place update)
CloudWatch rulesManaged individuallyManaged by sub-module
IRSA modulemodule.karpenter_irsa_role[0]Removed

Migration Steps

Before applying any step, always run tofu plan (or terraform plan) and carefully review the output. If you see unexpected resource deletions, stop and investigate before proceeding.
1

Pin modules to intermediate versions

Ensure all modules are at the following versions before proceeding. If any module is on an older version, update it to the version shown here and apply first.
ModuleVersion
networkv0.3.10
eksv0.7.21
efsv0.4.5
aws-load-balancer-controllerv0.1.5
karpenterv0.3.12
tfy-platform-featuresv0.4.13
control-plane (if applicable)v0.4.24
Run tofu plan (or terraform plan) and review the output before applying. Verify no unexpected changes are shown.
2

Prepare EKS and Karpenter modules

This step prepares the EKS cluster for Pod Identity and sets up the intermediate Karpenter version.1. Update the cluster moduleRemove the node_security_group_additional_rules block from the cluster module and bump it to v0.7.21 to install the EKS Pod Identity Agent:
# Remove this entire block from your cluster module configuration
node_security_group_additional_rules = {
  "ingress_control_plane_all" = {
    "description" = "Control plane to node all ports/protocols"
    "protocol"    = "-1"
    "from_port"   = 0
    "to_port"     = 0
    "type"        = "ingress"
    "cidr_blocks" = "${module.network.private_subnets_cidrs}"
  }
}
2. Update the Karpenter moduleMove the Karpenter module to v0.3.13 with the following settings:
module "karpenter" {
  source  = "truefoundry/truefoundry-karpenter/aws"
  version = "0.3.13"

  # Keep disable_old_changes as false to create new resources
  # alongside old ones during the transition
  disable_old_changes                              = false
  karpenter_iam_role_policy_prefix_enable_override = false

  # ... your existing configuration ...
}
Run tofu plan (or terraform plan) and review the output. You should see new resources being created (new SQS queue, new IAM role with Pod Identity) while existing resources remain unchanged.
Apply the changes after reviewing the plan.
3

Update Karpenter Helm chart values

These changes apply to the Karpenter Helm chart values, not the Karpenter Config Helm chart values.
1. Remove the serviceAccount annotationsFind and remove the following lines from your Karpenter Helm chart values:
# Remove these lines
serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<ACCOUNT_ID>:role/<KARPENTER_ROLE>
This annotation is no longer needed because Karpenter will now use Pod Identity instead of IRSA for authentication.2. Update the interruptionQueue nameAppend -queue to the end of the interruptionQueue value in your Karpenter Helm chart values:
# Before
settings:
  interruptionQueue: <cluster_name>-karpenter

# After
settings:
  interruptionQueue: <cluster_name>-karpenter-queue
3. Submit the Helm chart changes
You must submit and apply these Helm chart changes before proceeding to the next step. The Karpenter pods will restart and begin using the new SQS queue and Pod Identity. Verify that Karpenter pods are running and healthy before continuing.
4

Clean up old Karpenter resources

Run the Karpenter OpenTofu/Terraform module with disable_old_changes = true to remove the old IRSA-based resources:
module "karpenter" {
  source  = "truefoundry/truefoundry-karpenter/aws"
  version = "0.3.13"

  disable_old_changes                          = true
  karpenter_iam_role_policy_prefix_enable_override = false

  # ... your existing configuration ...
}
Run tofu plan (or terraform plan) and review the output. You should see the old IRSA module, old SQS queue, and old CloudWatch rules being destroyed. The new resources created in the previous step should remain untouched.
Apply the changes after confirming the plan only removes old resources.
5

Upgrade modules to final v6-compatible versions

Now upgrade all modules to their final AWS provider v6-compatible versions.1. Version-only bumpsUpdate the following modules to their new versions. These require only a version change with no other configuration modifications:
# Network
version = "0.4.0"

# Cluster (EKS)
version = "0.8.1"

# Platform Features
version = "0.5.0"

# Control Plane (if applicable)
version = "0.5.0"
2. EFS moduleThe EFS module now requires the cluster_oidc_issuer_arn input:
module "efs" {
  # ... existing configuration ...
  version                 = "0.5.0"
  cluster_oidc_issuer_arn = module.eks.oidc_provider_arn  # add this line
  # ...
}
3. EBS moduleThe EBS module requires use_name_prefix = false to prevent the IAM role from being recreated, and a policy_name parameter:
module "ebs" {
  source              = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts"
  version             = "6.4.0"
  create              = true
  use_name_prefix     = false  # add this to prevent role recreation
  name                = "${var.cluster_name}-csi-ebs"
  policy_name         = "${var.cluster_name}-csi-ebs-policy"  # add this line
  attach_ebs_csi_policy = true
  oidc_providers = {
    ebs = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["aws-ebs-csi-driver:ebs-csi-controller-sa"]
    }
  }
  tags = var.tags
}
4. Karpenter module (final version)Upgrade Karpenter to the final v0.4.0 release. The module configuration is simplified since the migration is now complete:
module "karpenter" {
  source                       = "truefoundry/truefoundry-karpenter/aws"
  version                      = "0.4.0"
  cluster_name                 = var.cluster_name
  controller_node_iam_role_arn = var.use_existing_cluster ? var.existing_cluster_node_role_arn : module.eks.eks_managed_node_groups.initial.iam_role_arn
  controller_nodegroup_name    = "initial"
  tags                         = var.tags
}
5. Update tfy-karpenter Helm chartWhen upgrading the Karpenter Terraform module to v0.4.0, also update the tfy-karpenter Helm chart to version 0.5.11.6. AWS Load Balancer Controller moduleUpgrade the AWS Load Balancer Controller module to the final v0.2.0 release. The module configuration is simplified since the migration is now complete:
module "aws-load-balancer-controller" {
  # ... existing configuration ...
  version = "0.2.0"
  elb_controller_use_name_prefix = false
  # ...
}
7. TrueFoundry moduleUpdate the TrueFoundry module to reference the new EBS IAM role ARN:
"awsEbsCsiDriver" = {
  "enabled" = true
  "roleArn" = "${module.ebs.arn}"
}
Run tofu plan (or terraform plan) and review the output carefully. If you see any unexpected resource deletions, investigate before applying. The use_name_prefix = false addition in the EBS module is specifically to avoid an unnecessary role recreation.
Apply all changes after reviewing the plan.
6

Post-migration validation

After completing all upgrade steps, verify that everything is working correctly.1. Verify Karpenter pods are healthy
kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
All Karpenter pods should be in Running status with all containers ready.2. Verify nodes can be provisioned
kubectl get nodeclaims
Check that existing NodeClaims are in a healthy state. If you have pending pods that require new nodes, verify that Karpenter provisions them.3. Verify Pod Identity associations
aws eks list-pod-identity-associations --cluster-name <cluster_name>
Confirm that a Pod Identity association exists for the Karpenter service account.4. Verify all module resources
tofu plan
Run a final plan to confirm no further changes are pending. The output should show No changes. Your infrastructure matches the configuration.

Rollback

If you encounter issues during the migration, you can revert to the previous state. Before Step 3 (old resources still exist):
  1. Revert the Karpenter Helm chart values to restore the serviceAccount.annotations and original interruptionQueue name (without the -queue suffix).
  2. Revert the Karpenter module version to v0.3.12 in your .tf files.
  3. Revert the cluster module to restore the node_security_group_additional_rules block and set it back to v0.7.20 if needed.
  4. Run tofu plan (or terraform plan) to confirm the rollback scope, then apply.
After Step 3 (old resources have been removed):
  1. Set disable_old_changes = false on the Karpenter module (still at v0.3.13) and apply to recreate the old resources.
  2. Revert the Karpenter Helm chart values to restore the serviceAccount.annotations and original interruptionQueue name.
  3. Once Karpenter is healthy with the old resources, revert the module version to v0.3.12 and apply.
The phased Karpenter upgrade is designed so that you can revert the Helm chart values at any point before Step 3 to fall back to the old IRSA-based resources without disruption.

Troubleshooting

If Karpenter pods are crashlooping after Step 3, verify that:
  1. The Karpenter Helm chart values were updated before running Step 3. The serviceAccount.annotations should be removed and interruptionQueue should have the -queue suffix.
  2. The Pod Identity association was created successfully. Check with:
    aws eks list-pod-identity-associations --cluster-name <cluster_name>
    
  3. The new SQS queue exists:
    aws sqs get-queue-url --queue-name <cluster_name>-karpenter-queue
    
If the Helm values were not updated before Step 3, revert disable_old_changes to false and apply to recreate the old resources, then follow the steps in order.
If tofu plan (or terraform plan) shows resources being destroyed that you do not expect, do not apply. Common causes include:
  • IAM role recreation: Ensure the EBS module has use_name_prefix = false set. Without this, the role name gets a random suffix and OpenTofu/Terraform sees it as a new resource.
  • State drift: If resources were modified outside of OpenTofu/Terraform, the plan may show unexpected changes. Run tofu refresh (or terraform refresh) to sync state before re-running the plan.
  • Module source changes: Verify all module source and version fields match the values in this guide exactly.
If Karpenter is not processing spot interruption events:
  1. Confirm the interruptionQueue value in the Karpenter Helm chart matches the actual SQS queue name. After migration, it should be <cluster_name>-karpenter-queue.
  2. Verify the queue exists and has the correct permissions:
    aws sqs get-queue-attributes --queue-url <queue_url> --attribute-names All
    
  3. Check Karpenter logs for queue-related errors:
    kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i "sqs\|queue\|interruption"
    
If Karpenter is unable to assume its IAM role after the migration:
  1. Verify the EKS Pod Identity Agent addon is installed and running:
    kubectl get pods -n kube-system -l app.kubernetes.io/name=eks-pod-identity-agent
    
  2. Confirm the Pod Identity association exists:
    aws eks list-pod-identity-associations --cluster-name <cluster_name>
    
  3. Restart the Karpenter pods to pick up the Pod Identity credentials:
    kubectl rollout restart deployment -n kube-system karpenter
    
  4. If the EKS Pod Identity Agent is missing, verify that the cluster module was upgraded to v0.7.21 or later in Step 1, which installs this addon.