Skip to main content
This page walks through creating a virtual model in the TrueFoundry dashboard and using it from your application. For an overview of routing strategies, health detection, and advanced features like sticky routing, see the Virtual Models overview.

Create a virtual model

1

Open Virtual Models in AI Gateway

From the TrueFoundry dashboard, go to AI GatewayModelsVirtual Model.
Navigate to Virtual Models in AI Gateway
Virtual models live inside Virtual Model Provider Groups. You can add models to an existing group or create a new group when you start.
2

Create or select a provider group and set access controls

Give the group a unique name (3–64 characters, alphanumeric and hyphens, cannot start with a number). Configure collaborators:
  • User — May call the virtual models in this group for inference.
  • Manager — May change virtual model configuration.
Create Virtual Model Provider Group and configure access controls
See Gateway access control for details.
3

Define the virtual model, strategy, and targets

For each virtual model in the group, set:
  • Name — Identifier used in the full path group-name/virtual-model-name (for example gpt-4-production).
  • Model types — Operation kinds this virtual model supports (chat, completion, embedding, and so on). All targets must support the operation you invoke.
  • Routing strategy — Choose one of three strategies:
    StrategyWhen to use
    Weight-basedCanaries, fixed capacity splits, A/B allocation. Assign weights that sum to 100.
    Latency-basedAutomatic performance chasing. No weights needed — the gateway picks the fastest.
    Priority-basedPrimary + backup topologies. Assign priority numbers (0 = highest).
    For how each strategy works, see the overview.
  • Target models — For each target, configure:
    FieldDescription
    TargetA real model from the catalog (not another virtual model).
    WeightTraffic share (weight-based only). Weights across targets should sum to 100.
    PriorityPriority level (priority-based only). Lower number = higher priority.
    Retry configAttempts, delay (ms), and status codes that trigger retries. Defaults: 2 attempts, 100 ms delay, retry on 429, 500, 502, 503.
    Fallback status codesHTTP codes that cause fallback to another target. Defaults: 401, 403, 404, 429, 500, 502, 503.
    Fallback candidateWhether this target may receive traffic when another target fails. Default: true.
    Override parametersPer-target request parameters like temperature, max_tokens, or prompt_version_fqn for model-specific prompts.
    Header overridesInject or remove HTTP headers for this target only. Use set to add/overwrite headers and remove to strip them. See header overrides.
    prompt_version_fqn override does not apply when using agents with MCP/tools; it is supported for standard chat completion requests.
  • Slug (optional) — Short global alias for this virtual model. See Slugs.
Configure virtual model details, routing strategy, and target models
Configure the slug in the Virtual Model Provider Group settings. Slugs must be unique across all virtual models in the tenant.

Common patterns

The following YAML sketches show the routing_config shape used inside a virtual model. In the dashboard, the same fields are set in the UI.
routing_config:
  type: priority-based-routing
  load_balance_targets:
    - target: azure/gpt4
      priority: 0
      fallback_status_codes: ["429"]
    - target: openai/gpt4
      priority: 1
      fallback_status_codes: ["429"]
    - target: anthropic/claude-3-opus
      priority: 2
routing_config:
  type: weight-based-routing
  load_balance_targets:
    - target: azure/gpt4-v1
      weight: 90
    - target: azure/gpt4-v2
      weight: 10
routing_config:
  type: priority-based-routing
  load_balance_targets:
    - target: onprem/llama
      priority: 0
      fallback_status_codes: ["429", "500", "502", "503"]
    - target: bedrock/llama
      priority: 1
      retry_config:
        attempts: 2
        delay: 100
routing_config:
  type: latency-based-routing
  load_balance_targets:
    - target: azure/gpt4
      retry_config:
        attempts: 1
    - target: openai/gpt4
      retry_config:
        attempts: 1
routing_config:
  type: weight-based-routing
  load_balance_targets:
    - target: openai/gpt4
      weight: 70
      override_params:
        prompt_version_fqn: chat_prompt:internal/my-app/gpt4-optimized-prompt:1
    - target: anthropic/claude-3-opus
      weight: 30
      override_params:
        prompt_version_fqn: chat_prompt:internal/my-app/claude-optimized-prompt:1
routing_config:
  type: weight-based-routing
  sticky_routing:
    ttl_seconds: 3600
    session_identifiers:
      - key: x-user-id
        source: headers
      - key: x-conversation-id
        source: headers
  load_balance_targets:
    - target: provider-a/model-a
      weight: 70
      fallback_candidate: true
    - target: provider-b/model-b
      weight: 30
      fallback_candidate: true

Environment- or segment-specific routing

Use different virtual model names per environment or segment (for example booking-app/gpt-prod vs booking-app/gpt-dev) and have your client pass the appropriate model. You can still send metadata and headers for observability, rate limits, and other gateway features; routing for a given virtual model name is always defined on that virtual model.

Use a virtual model from your application

Once created, use the full path virtual-model-group-name/virtual-model-name as the model value in API requests — it works like any other model in the gateway.

Try in the Playground

  • Click Try in playground on the virtual model row after creation, or
  • Open the Playground and pick the virtual model from the model dropdown.
Try in playground button next to virtual model
Select virtual model from model dropdown in playground

Virtual model slugs

Slugs are optional short names that refer to a single virtual model. They are unique across the tenant. You can use either the slug or the full group/model path in requests.
Configure slug in Virtual Model Provider Group settings
If the virtual model is my-first-virtual-account/model-1 with slug virtual-model-1, both bodies are valid:
{
  "model": "my-first-virtual-account/model-1",
  "messages": []
}
{
  "model": "virtual-model-1",
  "messages": []
}