The API supports various media types including images, audio, video and pdf.
Images
Supported Providers: OpenAI, Bedrock, Anthropic, Google Vertex, Google GeminiSend images as part of your chat completion requests using either URLs or base64 encoding:
import base64def encode_image(image_path): with open(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8')response = client.chat.completions.create( model="openai-main/gpt-4o", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What's in this image?"}, { "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{encode_image('image.jpeg')}" } } ] } ])
Media Resolution
Supported Providers:OpenAI, Azure OpenAI, Google Gemini, Google Vertex AI, xAIThe detail parameter in the image_url object allows you to control the resolution at which images are processed. This helps balance between response quality, latency, and cost.Supported Values: low, high, auto
import base64from openai import OpenAIAPI_KEY = "your_truefoundry_api_key"BASE_URL = "{GATEWAY_BASE_URL}"# Read and encode the image as base64with open("test-img.png", "rb") as image_file: base64_image = base64.b64encode(image_file.read()).decode('utf-8')client = OpenAI( api_key=API_KEY, base_url=BASE_URL)response = client.chat.completions.create( model="test-123/gemini-3-pro-preview", messages=[ { "role": "user", "content": [ {"type": "text", "text": "What is in this image?"}, { "type": "image_url", "image_url": { "url": f"data:image/png;base64,{base64_image}", "detail": "low" # Options: "low", "high", "auto" } } ] } ])print(response.choices[0].message)
For Google Gemini and Vertex AI providers, the detail parameter is automatically translated to the mediaResolution parameter:
"low" → MEDIA_RESOLUTION_LOW (64 tokens)
"high" → MEDIA_RESOLUTION_HIGH (256+ tokens with scaling)
"auto" or omitted → No explicit media resolution (model decides)
Audio
Supported Models: Google Gemini models (Gemini 2.0 Flash, etc.)Send audio files in supported formats (MP3, WAV, etc.). Currently supported for Google Gemini models:
import base64def encode_video(video_path): with open(video_path, "rb") as video_file: return base64.b64encode(video_file.read()).decode('utf-8')response = client.chat.completions.create( model="internal-google/gemini-2-0-flash", messages=[ { "role": "user", "content": [ {"type": "text", "text": "Describe what's happening in this video"}, { "type": "image_url", "image_url": { "url": f"data:video/mp4;base64,{encode_video('video.mp4')}", "mime_type": "video/mp4" # required for gemini models } } ] } ])
PDF Documents
Supported Providers: OpenAI, Bedrock, Anthropic, Google Vertex, Google GeminiPDF document processing allows models to analyze and extract information from PDF files:
TrueFoundry supports vision models from all integrated providers as they become available. These models can analyze and interpret images alongside text, enabling multimodal AI applications.