Self-Hosted Models - TrueFoundry Docs

If you run LLMs on your own infrastructure — on-premises GPUs, private cloud instances, or any model server not deployed through TrueFoundry — you can connect them to the AI Gateway by providing the endpoint URL and authentication details. Once registered, these models appear in the gateway model catalog alongside cloud providers and benefit from all gateway features: routing, guardrails, rate limiting, cost tracking, and observability.

Prerequisites

Your model server must expose an HTTP endpoint. The gateway works best with OpenAI-compatible APIs, which is the default output format for popular inference frameworks:

vLLM — OpenAI-compatible by default
Ollama — OpenAI-compatible API
SGLang — OpenAI-compatible by default
Text Generation Inference (TGI) — supports OpenAI-compatible mode

Adding Self-Hosted Models to the Gateway

Follow these steps to add a self-hosted or external model (deployed on TrueFoundry or elsewhere) to the AI Gateway.

Navigate to Self Hosted Models in AI Gateway

From the TrueFoundry dashboard, navigate to AI Gateway > Models and select Self Hosted Models.

Navigating to Self Hosted Provider Account in AI Gateway

Add Collaborators

Add collaborators to your account, this will give access to the account to other users/teams. Learn more about access control here.

Self Hosted account configuration form with fields for API key and collaborators

Configure Model Details

Give a name to your hosted model and add ModelId, add the URL to the hosted model. Choose the correct Model Server for your model and optionally add Auth Data which will be used for Authentication of request to the model.

Configuration form showing fields for model ID, URL, and authentication

Inference

After adding the models, you can perform inference using an OpenAI-compatible API via the Playground or by integrating with your own application.

Code Snippet and Try in Playground Buttons for each
model

Snowflake Cortex Custom Endpoints

⌘I

​Prerequisites

​Adding Self-Hosted Models to the Gateway

​Inference

Prerequisites

Adding Self-Hosted Models to the Gateway

Inference