Skip to main content

Adding Models

This section explains the steps to add Google Vertex AI models and configure the required access controls.
1

Navigate to Google Vertex Models in AI Gateway

From the TrueFoundry dashboard, navigate to AI Gateway > Models and select Google Vertex.
Navigating to Google Vertex Provider Account in AI Gateway
2

Add Google Vertex Account and Authentication

Give a unique name to your Google Vertex account. This will be used to refer to the models later. Add collaborators to your account, this will give access to the account to other users/teams. Learn more about access control here.
Required IAM RoleThe Google Cloud identity used by the gateway (a service account, whether referenced by a key, by GKE Workload Identity, or by Workload Identity Federation) must have the Vertex AI User role (roles/aiplatform.user), which includes the aiplatform.endpoints.predict permission required by the gateway.The gateway supports three authentication methods. Pick the one that matches your deployment.1. Using Service Account JSON KeyThis method works for all deployment types (GKE, EKS, AKS, on-premises, or the SaaS Gateway).
  • Generate a Service Account JSON key by following the official Google Cloud documentation here.
  • The service account must have the Vertex AI User role.
  • When adding the provider account in TrueFoundry, select Service account key file as the authentication type and paste the JSON key into the Service account key JSON field (or store it as a secret and reference it).
2. Using Workload Identity Federation (Keyless, Cross-Cloud)Workload Identity Federation (WIF) lets the gateway authenticate to Google Cloud without service account keys, even when running outside of GKE — for example, on Amazon EKS, Azure AKS, or on-premises Kubernetes clusters. It works by exchanging a short-lived Kubernetes service account token for a Google Cloud access token through Google’s Security Token Service.
Workload Identity Federation is the recommended approach for production deployments running outside of GKE. It eliminates long-lived service account keys while supporting any Kubernetes environment, and it also works on the SaaS version of the AI Gateway.
Prerequisites
  1. A Google Cloud project with Vertex AI enabled.
  2. A Workload Identity Pool and Provider configured in Google Cloud IAM. Follow the official guide: Configure Workload Identity Federation with Kubernetes.
  3. A Google Cloud IAM service account that the federated identity can impersonate, with the Vertex AI User role (roles/aiplatform.user) granted on the project.
  4. The Kubernetes service account used by the gateway must have permission to issue TokenRequest resources for itself. The TrueFoundry-provided Helm chart configures this RBAC automatically.
Generate the credential configuration JSONUse the gcloud CLI to generate the credential configuration file:
gcloud iam workload-identity-pools create-cred-config \
  projects/<PROJECT_NUMBER>/locations/global/workloadIdentityPools/<POOL_ID>/providers/<PROVIDER_ID> \
  --service-account=<GSA_EMAIL> \
  --credential-source-type=programmatic \
  --output-file=credential-config.json
This produces a JSON file with "type": "external_account" describing the identity pool, audience, and STS token-exchange endpoints. It is not a private key.Configure in TrueFoundryWhen adding or editing the Vertex AI provider account:
  1. Select Workload Identity Federation file as the authentication type.
  2. Paste the contents of the generated credential-config.json into the Key file content field, or store it as a secret and reference it.
Resumable file uploads (used for some batch and fine-tuning workflows that upload files to Google Cloud Storage via signed URLs) are not yet supported with Workload Identity Federation. If you rely on those flows, use a Service Account JSON key instead.
3. Using GCP Workload Identity on GKE (Self-Hosted Gateway only)When running the gateway inside Google Kubernetes Engine (GKE), you can rely on GKE’s built-in Workload Identity, which lets a Kubernetes service account (KSA) act as a Google Cloud IAM service account (GSA) automatically through the GKE metadata server.
GKE Workload Identity is GKE-specific. Pods using the configured KSA authenticate as the associated GSA when accessing Google Cloud APIs, with no extra configuration on the gateway side.
To set up GKE Workload Identity, follow the official Google Cloud documentation: Configure Workload Identity on GKE.When adding the Vertex AI provider account in TrueFoundry, leave the authentication section empty — the gateway will automatically pick up GKE Workload Identity credentials via Application Default Credentials (ADC).
GCP Workload Identity (GKE ADC) does not work on the SaaS version of the Gateway, and it only works when the gateway runs inside a GKE cluster. For all other environments, use Workload Identity Federation or a Service Account JSON key.
Google Vertex account configuration form with fields for name, project ID, service account JSON, and region
3

Configure Project ID and Region

Provide your Google Cloud Project ID and a default Region for all models under this account. You can override the region for individual models later.Project ID
  • You can find your Project ID in the top-right corner of your Google Cloud Console.
Google Cloud Console header showing project ID location in the dropdown menu
Region
  • Specify a default region for all models under this account. You can override this region for individual models later.
4

Add Models

You can either select available models from the list or add them manually by clicking + Add Model. When adding a model manually, the Model ID format depends on the provider.
Select a Gemini model from the list or add it manually.
  • Model ID Format: google/<vertex-model-id>
  • Example: google/gemini-1.5-pro
You can find the Model ID in the Google Cloud Console.
Google Cloud Console showing Gemini model details with model ID highlighted
Select a Claude model from the list or add it manually.
  • Model ID Format: anthropic/<vertex-model-id>
  • Example: anthropic/claude-3-5-sonnet-v2@20241022
Google Cloud Console showing Anthropic Claude model details with model ID highlighted
Select a Mistral model from the list or add it manually.
  • Model ID Format: mistralai/<vertex-model-id>
  • Example: mistralai/mistral-large-2411@001
Google Cloud Console showing Mistral AI model details with model ID highlighted
When adding any model manually, you can specify a Region to override the default one set at the account level.

Inference

After adding the models, you can perform inference using an OpenAI-compatible API via the Playground or by integrating it with your own application.
Code Snippet and Try in Playgroud Buttons for each Google Vertex model

FAQs

No. You can set a default region at the account level and override it for each individual model if needed. This allows you to use models from different regions with a single provider account.
  • Service Account JSON Key — Works everywhere (any cloud, on-prem, SaaS Gateway). Simplest to set up, but requires you to manage and rotate a long-lived secret.
  • Workload Identity Federation — Recommended for production. Keyless, works on any Kubernetes cluster (EKS, AKS, GKE, on-prem) and on the SaaS Gateway. Requires a one-time setup of a Workload Identity Pool in Google Cloud.
  • GCP Workload Identity (GKE) — Only available when the self-hosted gateway runs inside a GKE cluster. Keyless and zero-config on the gateway side, but does not work on the SaaS Gateway or outside of GKE.
Service Account KeyWorkload Identity FederationGCP Workload Identity (GKE)
Works on GKEYesYesYes
Works on EKS / AKS / on-premYesYesNo
Works on SaaS GatewayYesYesNo
Key management requiredYesNoNo
Requires credential JSON in TrueFoundryYes (service account key)Yes (external_account config)No (leave empty)
Both are keyless authentication mechanisms, but they target different environments.GCP Workload Identity is a GKE-only feature. The GKE metadata server automatically maps a Kubernetes service account to a Google Cloud IAM service account. The gateway picks this up through Application Default Credentials (ADC) when no auth data is configured. It does not work on the SaaS Gateway or outside of GKE.Workload Identity Federation is a broader Google Cloud feature that works across any Kubernetes cluster (EKS, AKS, on-prem, and GKE) and on the SaaS Gateway. It requires you to provide an external_account credential configuration JSON (generated via gcloud iam workload-identity-pools create-cred-config). The gateway exchanges a short-lived Kubernetes service account token for a Google Cloud access token through Google’s Security Token Service.
Gemini is generally recommended for individual developers and prototyping use cases, while Vertex AI is recommended for production and enterprise use cases.Vertex AI offers everything available in the Gemini API and more, including:
  • More secure auth using service accounts instead of API keys
  • A Model Garden that includes multiple third-party models
  • Access to provisioned throughput
You can read more about this here: