Velero migrates Kubernetes API objects and (depending on configuration) volume
data. It does not replace a full disaster-recovery design. Always run a
dry run or non-production rehearsal before production cutovers.
Why stop TFY Agent before backup and restore
TFY Agent and TFY Agent Proxy connect the compute plane to the TrueFoundry control plane. The proxy can create, update, and delete resources in the cluster based on control-plane instructions. During migration, the target cluster may be empty or only partially provisioned for a period of time. If TFY Agent Proxy is running there before namespaces and resources restored from Velero exist, the control plane can reconcile against a cluster whose state does not yet match what you expect. That can lead to unintended deletions or conflicting writes (for example, pruning or recreating resources in namespaces that are not ready yet). Recommended practice: scale bothtfy-agent and tfy-agent-proxy to zero on whichever cluster should not be actively reconciling during the migration window, then bring them up only on the target cluster after Velero restore (and your TrueFoundry integration for the new cluster) is in place.
Prerequisites
- Velero on the source cluster with a BackupStorageLocation (BSL) to object storage—complete bucket creation, IAM, and install using the first step below and the linked Velero docs.
- Network and IAM on the target so Velero there can read the same backup bucket. For migration, Velero’s documentation recommends configuring the target cluster’s BSL as read-only so restores do not accidentally delete shared backup data (Velero cluster migration).
- Kubernetes version: Velero does not support restoring into a cluster whose Kubernetes version is lower than the cluster where the backup was taken. Prefer the same or newer minor version on the target.
- Persistent volumes: Snapshot-based migration across different cloud providers is not natively supported the way same-provider same-region flows are. For cross-provider moves or when snapshots are not portable, consider Velero File System Backup or the snapshot data mover—see the Velero cluster migration “Before migrating your cluster” section.
Migration workflow
Configure object storage, IAM, and Velero on the source cluster
Before you can back up to a shared location, create and secure the backup bucket (or equivalent) in your cloud and grant Velero the IAM / workload identity permissions it needs to read and write backup metadata and objects. You also need a Velero install on the source cluster with a BackupStorageLocation pointed at that bucket.Velero and each provider plugin document the exact policies, roles, service accounts, and install flags—follow those rather than copying partial snippets here:
- Velero basic install and supported providers
- Cluster migration (shared bucket across clusters)
- Provider plugins: AWS, GCP, Azure
Scale down TFY Agent on the source cluster
Stop the components in the Verify they are scaled down:
tfy-agent namespace (adjust names if your release uses different Deployment names):If you use GitOps (for example Argo CD) to manage these Deployments, use your
usual workflow to set replicas to
0 so they are not immediately scaled back
up by a sync.Create a Velero backup on the source cluster
Create a backup that includes the namespaces and cluster-scoped resources you need. Example (replace To include volume data, follow Velero’s documentation for your storage class (CSI snapshots, File System Backup, etc.). Wait until the backup phase is Completed:
<BACKUP_NAME> and adjust namespace list or use resource filters as required):Install Velero on the target cluster and point it at the same backup location
On the target cluster, install Velero and configure a BackupStorageLocation that references the same bucket (or prefix) as the source. For migration, Velero recommends a read-only BSL on the cluster where you only restore, to avoid accidental deletion of backup objects in object storage—see Cluster migration.Use the install or
velero backup-location create flow that matches your cloud; the official plugin repositories above contain the correct flags for AWS, GCP, and Azure.After install, confirm the backup appears on the target (Velero syncs Backup metadata from object storage; default sync interval is on the order of one minute):Restore on the target cluster
Create a restore from the backup:Inspect the restore for warnings or partial failures:
Validate the target cluster
Confirm namespaces, workloads, ConfigMaps, Secrets, ingress, and data volumes behave as expected. Reconcile any cloud-specific resources (DNS, load balancers, IAM, node pools) that are not carried by Velero.
Attach the new cluster in TrueFoundry and configure TFY Agent
- Complete compute plane setup on the target cluster if it is not already installed (Istio, Argo CD, TrueFoundry components, etc.), or rely on what Velero restored—depending on what you included in the backup.
- Ensure the cluster token / secret used by TFY Agent matches the new cluster registration in the TrueFoundry control plane. If Velero restored an old
tfy-agentSecret, update it to the credentials issued for the new cluster before scaling the agent back up. - Scale TFY Agent back up on the target cluster only:
- Confirm in the control plane that the new cluster is healthy and that applications and workspaces appear as expected.
Summary
- Configure the backup bucket and IAM (per Velero and your cloud provider’s plugin docs), then install Velero on the source cluster with a BackupStorageLocation.
- Scale down
tfy-agentandtfy-agent-proxyon the source cluster (and avoid GitOps immediately reverting that). - Backup with Velero to shared object storage.
- Restore on the target cluster using the same backup location (prefer read-only BSL on the target).
- Validate data and APIs; watch for skipped resources when names already exist on the target.
- Connect the target cluster to TrueFoundry with the correct integration credentials, then scale up TFY Agent on the target only.